UNCLASSIFIED 


<0  AD 

\>Q 

^  INFORMATION  SCIENCE 

OUTLINE,  ASSESSMENT,  INTERDISCIPLINARY  DISCUSSION 

by 

A.  S.  Iberall 

GENERAL  TECHNICAL  SERVICES,  INC. 

Yeadon,  Psnna. 


Report  No.  1:  Information  Science 
Contract  No.  DA  49-092- ARO-114 


D  D  C 


UNCLASSIFIED  ■ 


UNCLASSIFIED 


Report  No.  I 
Contract  No. 


AD- 


INFORMATION  SCIENCE 

OUTLINE,  ASSESSMENT,  INTERDISCIPLINARY  DISCUSSION 

'  7  ,  K,*  H 


J  #  • 

by  ■  .’  | 

A.  S.  Iberall 

GENERAL  TECHNICAL  SERVICES,  INC. 

Yeadon,  Penna. 


prepared  for 

ARMY  RESEARCH  OFFICE 
Arlington,  Va. 


June  1966 


:  Information  Science 

DA  49-092-ARO-114 


Reproduction  in  whole  or  in  part 
is  permitted  for  any  purpose  of 
the  United  States  Government. 


UNCLASSIFIED 


NOTICES 


Distribution 


Distribution  of  this  document  is  unlimited. 


Disposition 

Destroy  this  report  when  no  longer  needed.  Do  not  return  it 
to  the  originator. 


Disclaimer 

The  findings  in  this  report  are  not  to  be  construed  as  an  official 
Department  of  the  Army  position  unless  so  designated  by  other  authorized 
documents • 


Rights 


Reproduction  in  whole  or  in  part  is  allowed  for  purposes  of  the 
U.  S.  Government. 


Availability 

Qualified  U.  S.  Government  requesters  may  obtain  copies  of  this 
document  from  the  Defense  Documentation  Center,  Cameron  Station, 
Alexandria,  Virginia. 


It 


FOREWORD 


Information  Science  has  been  something  of  a  disappointment,  even 
to  those  who  have  been  most  enthusiastic  about  the  opportunities  it 
presents  and  about  its  ultimate  value  or  universality.  So  far,  it 
has  not  developed  into  either  a  generally  useful  set  of  tools  for 
problem-solving  or  into  a  coherent  theory  of  the  abstract  information 
process,  independent  of  context.  Nevertheless,  every  scientist  finds 
himself  studying  these  processes  and  wishing  that  he  had  a  better 
insight  as  to  their  meaning  or  significance  in  his  science. 

This  Report,  comprising  an  introduction  and  assessment  of  the 
interdisciplinary  literature  in  three  major  aspects  of  the  subject, 
is  largely  a  personal  contribution,  partially  speculative  in  nature. 
It  will  have  accomplished  its  principal  purpose  if  it  helps  Army 
scientists  to  become  more  familiar  with  Information  Science,  and  in¬ 
cidentally  generates  some  interesting  and  lively  controversies. 

(In  addition  to  the  named  author,  Dr.  S.  Z.  Cardon  and  E.  Young 
have  also  contributed  ideas  to  the  Report.) 


G.  H.  McCLURG 
Contracting  Officer's 
Technical  Representative 
Contract  No.  DA  49-092 -ARO- 114 


ABSTRACT 


This  Report  provides  an  assessment  and  introduction  to  the 
interdisciplinary  literature  of  three  aspects  of  Information  Science, 
in  annotated  bibliography  form*  These  are:  communication  networks; 
human  information  processes,  principally  language  and  information 
retrieval;  and  the  large  cybernetic  systems  such  as  the  human 
brain  and  central  nervous  system. 
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INTRODUCTION 


A  background  for  research  planning  and  management  waa  Issued  in  an 
earlier  phase  (1).  This  second  phase  has  Involved  a  number  of  specific 
tasks. 


The  assignment  In  this  task  was  to  outline,  assess,  and  add  Inter¬ 
disciplinary  discussion  In  depth  of  the  field  of  Information  science.  In 
extension  of  work  started  In  (1). 


OUTLINE  AND  SUMMARY  OF  THE  EARLIER  WORK 


From  (1),  che  following  salient  ideas  may  be  abstracted. 

1.  Information  science  is  concerned  with  storage  and  flow  of 
information  within  systems. 

2.  A  system  may  be  defined,  for  this  task,  as  a  logical  struc¬ 
ture,  whose  description  is  built  up  on  the  basis  of  a  metalanguage  to  permit 
talking  about  forms  (things)  and  functions;  upon  definitions  that  focus  atten¬ 
tion  and  propose  particular  elements  for  study;  upon  axioms  that  represent  an 
assumption  of  certain  logically  defined  properties;  upon  a  methodology  for 
operational  manipulation;  and  upon  various  tests  fcr  the  completeness  of  the 
entire  structure. 


3.  All  systems  are  not  complete,  so  that  commonly  one  deals 
with  systems  of  Incomplete  specification. 

4.  The  systems  of  Interest  are  generally  viewed  m  two  contexts: 
one,  the  paper  system  that  was  logically  described  thus  far,  and  two,  an 
actual  physical  system  of  structural  form  and  function  which  the  paper  system 
attempts  to  describe,  l.e,,  which  will  correspond  in  some  formal  sense  in  form 
and  function. 


5.  To  the  mathematical  scientist,  the  paper  system  stands  by 
Itself.  To  the  physical  scientist,  the  paper  system  is  designed  to  be  an  iso¬ 
morphic  'scorekeeping'  system,  but  the  real  problem  is  to  describe  physically 
derivable  phenomena.  The  mathematical  scientist  may  thus  be  concerned  with 
simplicity  and  logical  rigor  in  his  system  descriptions.  The  physical  scien¬ 
tist  must  also  be  concerned  with  a  logical  system.  However  he  may  be  con¬ 
cerned  with  a  less  complete  paper  system,  and  permit  modifications  ad  lib  of 
the  descriptive  foundations  to  bring  the  system  science  into  closer  conformity 
with  reality. 


6.  Much  of  che  development  in  information  science,  historically, 
has  taken  the  mathematical  logical  descriptive  path.  However  one  should  seek 
to  enrich  the  field  of  systems  science  from  a  phyeical  view. 

7.  Physically  founded  systems  science  will  be  concerned  with 
the  description  of  the  static  (time  independent)  and  dynamic  (time  dependent) 
characteristics  of  real  complex  systems  In  terms  of  the  fundamental  functions 
of  the  mechanisms  that  make  up  the  system. 
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8.  Information  science  will  be  concerned  with  the  abstracted 
content  of  the  fluxes  of  mass  and  energy  and  their  transformations  within  phy¬ 
sical  systems  that  change  in  time,  and  their  transformation  back  and  forth  to 
time -Independent  form. 

9.  More  precisely,  information  science  may  be  defined  as  being 
concerned  with  the  formulation,  abstraction,  codification,  translation,  trans¬ 
mission,  retrieval,  reconstruction  and  storage  of  coherence  in  fluxes  or 
potentials  that  traverse  systems  In  space  and  time.  Coherence  is  undefined. 

It  is  whatever  the  sender  wants  it  to  be,  generally  what  a  human  sender  pro¬ 
poses  to  regard  as  coherent. 

kO.  Thus  Information  science  has  been  developing  around  three 
problems  -  the  theory  of  Information  In  the  network;  the  nature  of  human  type 
of  Information  handling,  of  its  storage  and  retrieval;  and  the  nature  of  the 
human  informational  system,  l.e.,  a  theory  of  the  brain. 

11.  The  hierarchy  of  systems  that  are  generally  involved  as  in¬ 
formation  science  problems  are  the  following  provisional  list: 

systems  of  entitles  -  'things' 

systems  of  relations 

systems  of  functions 

physical  networks  (i.e.,  from  electrical 

networks  to  'brains') 

isomorphic  naming  systems 

information  storage  systems  -  'libraries'  or  books 

physical  networks 

manual  changing 
D.C.  networks 

dynamic  systems  near  equilibrium  (vibrations) 

automatic  control  systems 

non-linear  dynamic  systems 

non-linear  control  systems 

adaptive  control  systems 

cybernetic  governing  machines 

humans  (homeostatic  systems) 

social  organisations. 

12.  The  problems  treated  as  part  of  a  theory  of  Information  of 
the  network  has  bean: 

a.  At  the  lowest  level;  given  a  class  of  Input  'pat tarns,' 
how  dots  a  particular  class  of  alamantary  networks  transform  theee  inputs  into 
outputs? 


b.  At  tha  next  ltval,  in  which  that#  can  b«  functional 
network  changes,  e.g.,  switch  networks,  how  do  inpuca  transform?  Thia  contains 
computer  theory. 


c.  At  the  naxt  lavtl,  how  may  flxad  or  functionally 
changaabla  networks  be  synthesised  to  provide  specific  input-output  transforms? 
Thia  contains  the  electrical  network  problem,  automatic  control  problem  and 
part  of  tha  adaptivt  natwork  theory. 


13.  The  problems  created  in  human  information  storage  and  re¬ 
trieval  have  Included: 

a.  At  the  lowest  level,  the  library  problem  cf  indexing, 
storing,  abstracting  information. 

b.  At  the  next  level,  information  content  in  documents, 
their  coding,  storage,  and  retrieval. 

c.  At  the  higher  levels,  machine  translation,  pattern 
recognition,  and  more  complete  abstractions  of  automata  handling  of  informa¬ 
tion  from  input  to  output. 

14.  The  problems  treated  in  the  brain  system  are: 

a.  Detailed  characteristics  of  the  nerve  and  neural  net. 

b.  Automata,  cybernetic  machines,  and  the!1*  simulation  of 

brain  functions. 

c.  Mind,  brain,  and  behavior  from  a  mechanistic  view. 
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CUSS  1  PROBLEM  -  INFORMATION  THEORY  IN  THE  NETWORK 


1.  SOURCES  OF  INFORMATION 


The  statistical  mechanics  of  systems  dates  back  seriously  to  Maxwell 
and  Boltzmann.  Worthwhile  reading  to  bridge  the  gap  from  the  molecular  foun¬ 
dations  of  systems,  their  mechanical-thermodynamic  relations  of  change, 
fluctuation  and  noise  within  the  system,  and  the  theory  of  noise  are  Gibbs 
(2),  Fowler  (3),  Tolraan  (4),  Kennard  (5),  and  Chandrasekhar  (6).  The  sources 
of  fluctuations  in  space  and  in  time  are  discussed.  The  foundation  in  statis¬ 
tical  mechanics  for  handling  such  problems  as  fluctuations  is  laid.  Einstein's 
1905  treatment  of  Brownian  motion  is  covered,  as  veil  as  his  much  more  general 
1910  theory  of  fluctuations.  Tolman,  in  particular,  is  worthy  of  review  many 
times  over,  even  though  specific  theory  immediately  applicable  to  the  netvork 
is  not  contained  therein.  Chandrasekhar  gives  illustrations  from  a  variety 
of  physical  problems. 

That  statistical  mechanical  noise  existed  in  networks,  particularly 
electrical  networks,  was  quite  well  understood.  A  useful  review  is  contained 
in  Moullln  (7).  Most  of  the  discussion  is  taken  up  with  the  Schottky  effect, 
and  with  the  Nyquist  theory  of  Johnson  noise  (1928). 

A  suitable  introduction  to  random  processes,  as  it  soon  became  generally 
applied  to  electrical  networks  was  given  by  Rice  (8). 

The  book  that  has  become  classic  as  an  introduction  to  signal  and  noise 
in  electrical  networks  is  Lawson  and  Uhlenbeck  (9).  Basically  as  an  applied 
science  book,  it  showed  briefly  how,  for  random  processes  in  general  and  for 
various  statistical  mechanical  processes  in  particular,  noise  was  the  limiting 
factor  in  the  transmission  or  acquisition  of  'information.'  This  book,  it 
would  seem  was  in  the  main  line  of  the  mathematical  physical  development. 

More  modern  examples  of  the  analysis  of  physical  noise  is  Van  der  Ziel  (10), 
Bennett  (11),  or  Bell  (12). 

The  mathematical-engineering  line  of  what  is  commonly  referred  to  as 
'information  theory'  takes  *  different  path.  Illustrative  of  its  development 
are  the  papers  by  Nyquist  (13),  Hartley  (14),  Gabor  (15),  Kolmogoroff  (16), 
Wiener  (17),  and  Shannon  (18).  The  two  1948  papers  of  Shannon  are  commonly 
viewed  as  the  starting  point  of  the  modern  statistical  theory  of  connunlca- 
tlons,  or  'information  theory';  one  may  then  add  Tuller's  paper  (19).  A  re¬ 
view  of  the  extensive  literature  that  quickly  came  into  existence  by  1951  is 
given  by  Cherry  (20).  An  enrichening  view  of  the  content  of  Shannon's  infor¬ 
mation  theory  may  be  found  in  Pierce  (21). 
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(It  is  not  clear,  without  much  more  extensive  review,  why  these  two 
lines  of  the  physical  statistical  mechanics  of  systems  and  the  engineering 
information  theory  of  networks  took  such  a  course  of  divergence.  It  is  quite 
clear  in  Nyquist's  1924  or  1928  papers  that  the  problem  areas  were  connected 
in  his  mind,  and  Similarly  in  Khinchin  (22)  or  Brtllouin  (23)  that  the  prob¬ 
lem  areas  are  connected.  As  Pierce  indicates,  Shannon's  work  had  led  to  an 
extensive  literature  on  coding  theory.  However,  it  is  far  from  clear  that 
fundamental  advances  can  come  without  statistical  mechanics  or  thermodynamics, 
although  Shannon's  and  Wiener's  work  may  color  or  bias  the  attitude  of  the 
worker  in  this  field.) 

In  Pierce's  view,  information  theory,  in  the  coding  area,  deals  with 
"the  many  problems  that  have  been  troubling  communication  engineers  for  years." 
Substantially  most  of  his  discussion  is  concerned  with  coding  of  information, 
in  particular  the  coding  and  transmission  of  information  over  networks  with  a 
noisy  channel.  For  example  in  discussing  the  questions  that  just  hadn't  been 
asked  before  Shannon,  he  illustrates  with  "Suppose  that  I  told  you  that,  if 
the  sort  of  noise  in  the  channel  is  known  and  if  its  magnitude  is  known,  I 
can  calculate  just  how  many  characters  I  can  send  over  the  channel  per  second 
and  that,  if  I  send  any  number  fewer  than  this,  1  car.  do  so  virtually  without 
error,  while  if  I  try  to  send  more,  I  will  be  bound  to  make  errors,"  and  he 
points  out,  in  the  problems  of  encoding  "messages  for  error-free  transmission 
over  noisy  channels,"  that  "Shannon's  very  general  work  tells  us  in  principle 
how  to  proceed,"  "how  much  wiser  we  are  than  in  the  days  before  information 
theory,"  and  "  we  know  in  principle  how  well  we  can  do,  and  the  result  has 
astonished  engineers  and  mathematicians." 

In  the  chapter  on  information  theory  and  physics,  his  summary  makes  the 
following  points.  Various  physical  phenomena  produce  noises  that  interfere 
with  signals  used  for  transmission.  It  is  questionable  to  argue  the  relation 
of  the  concept  of  the  entropy  of  physics  and  that  of  communication  theory. 

While  attempts  have  been  made  to  use  information  theory  in  statistical  me¬ 
chanics,  it  would  be  more  useful  to  get  the  physical  limitations  imposed  on 
information  transmission  by  quantum  effects. 

(Thus  the  disciplines  of  mathematician,  physicist,  and  engineer  are 
still  concerned  with  the  physical  laws  that  determine  and  limit  the  perform¬ 
ance  of  systems,  laws  of  energetics  and  power,  statistical  mechanics,  as  well 
as  the  content  that  has  crept  in  through  coding  problems.  Thus  information 
theory,  even  in  its  lowest  level  'communications  theory'  problem  remains  an 
exercise  involving  many  disciplines. 

The  scientific  problem  stems  from  the  following:  Bell  Labs  undertook 
to  develop  communications  technology.  "Conmunications  is  our  business"  Is 
their  watchword.  This  has  always  included  developing  whatever  applied  science 
they  needed,  though  not  always  done  in  a  systematic  way.  This  lo  how  it 

juld  be.  Whether  the  material  need  be  systematized  is  an  academic  question, 
whether  done  by  academics  or  Internally  at  Bell  Labs.  Furthermore,  since  Bell 
Labs  did  not  have  an  absolute  monopoly  on  brains,  there  were  some  contribu¬ 
tions  from  outsiders.  It  should  be  noted  that  the  subject  of  'communications 
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theory'  was  on  interdisciplinary  things  -  not  necessarily  unitary,  except  for 
the  particular  company  interest.  Thus  the  science  does  not  have  to  grow  up 
neatly  and  tidily.  Nevertheless,  most  of  the  scientific  pieces  (although 
likely  not  systematic)  could  be  obtained  from  the  Bell  Lab  series  by  a  person 
with  broad  background. 

As  a  simpler  illustration  of  modern  information  theory  in  communications 
systems,  one  may  Inspect  such  books  as  Bsghdady  (24),  Grabbe  (25),  Brown  and 
Glazier  (26).  Pinsker  (27)  is  a  more  abstract  and  complex  treatment. 

Brown  and  Glazier  offer  a  useful  outlined  path  through  the  problems. 

They  start  from  the  basic  methods  used  in  electrical  communications  and  dis¬ 
cuss  the  nature  of  the  signal,  in  time  and  frequency  form,  the  forms  of 
modulation,  the  properties  of  communications  channels,  and  the  response  of 
linear  channels.  They  characterize  noise  and  discuss  the  elementary  informa¬ 
tion  theory  and  information  capacity  of  a  channel.  They  discuss  Rice's  1944 
paper. 


Baghdady,  on  the  other  hand,  is  an  excursion  in  modern  approaches  to 
communications  systems,  and  thus  includes  working  information  theory  refer¬ 
ences  and  theory  in  a  number  of  chapters. 

There  is  no  point  in  laying  down  a  foundation  in  electrical  networks, 
or  communications  networks.  Some  more  pertinent  books  are  -  in  some  semblance 
of  a  temporal  sampler  -  Shea  (28);  Guillemin  (29);  Bode  (30);  and  Cherry  (31). 
Shannon  and  Weaver's  book  (18)  will  be  found  to  fit  into  this  sequence  quite 
well  as  a  specialized  topic.  Black  (32),  Tuttle  (33),  and  Reich  (34)  illus¬ 
trate  aspects  of  post-war  network  analysis.  Modern  books  are  Weinberg  (35), 
Chen  (36),  or  of  the  current  genre,  Zadeh  (37). 

As  illustration  of  conmunications  theory  books  that  take  information 
theory  into  account,  there  are  Middleton  (38),  Kotel'nikov  (39),  or  Wozencroft 
(40),  Bennett  (11),  Bell  (42),  Wolfowitz  (48),  Reza  (44),  and  Abramson  (43). 

For  working  texts  in  information  theory,  there  are  Pierce  (21), 

Brillouin  (41),  Bell  (42),  Shannon  and  Weaver  (18),  Abramson  (43),  Reza  (44), 
Meyer-Eppler  (45),  Khinchine  (47),  Wolfov»itz  (48),  Felnstein  (49),  and 
Peterson  (50). 


2 .  OUTLINE 


With  the  many  sources  on  information  theory  lr.  the  network,  it  would  be 
wasteful  to  do  more  than  briefly  outline  the  problem. 

1.  Information  theory  is  a  problem  area  that  lies  within  the  sub¬ 
ject  of  communications  engineering  -  l.e..  It  is  the  study  of  transmission  of 
'intelligent'  signal  information  from  one  point  to  another,  generally  by  elec¬ 
trical  means.  A  knowledge  of  electrical  circi  theory  and  its  current  analytic 
techniques  la  assumed. 
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2.  The  signalling  operations  are  performed  within  a  system 
which  may  be  viewed  as  a  discrete  message  source,  an  encoder  (generally  by 
modulation  of  a  carrier  signal  that  provides  transmissible  power),  a  trans¬ 
mitter,  a  transmitting  channel  (in  part  modifiable  by  added  electrical  net¬ 
works),  a  receiver;  a  decoder  (generally  by  demodulation),  and  the  final 
message  receiver. 

3.  Communications  engineering,  and  thus  information  theory  in 
this  sense,  is  no:  concerned  with  the  specific  content  of  the  discrete  mes¬ 
sage,  but  with  a  class  of  all  such  messages,  out  of  which  specific  messages 
are  viewed  as  dratm  at  random.  (The  next  two  information  science  problems 
deal  with  the  content  and  the  reason  for  t!:e  messages,  also  as  classes.) 

4.  While  communications  engineering  is  concerned  with  system 
design,  analysis,  and  synthesis  of  conmunications  networks,  and  their  char¬ 
acteristics  and  problems  in  general,  information  theory  is  restricted  to  the 
nature  of  coding. 

5.  It  is  implicit  that  communications  systems  are  limited  by 
the  laws  of  physics  that  determine  the  behavior  of  systems,  so  that  part  of 
the  signals  that  pass  through  the  system  are  not  parts  of  the  Jiscrete  de¬ 
sired  message  transmission,  but  are  extraneous  characteristics  of  statistical 
mechanical  properties  and  thermodynamic-mechanical  couplings  of  and  to  the 
system.  This  may  be  viewed  as  part  of  the  physics  of  systems,  whereas  infor¬ 
mation  theory  is  only  concerned  with  the  problem  of  'economical'  coding  of 
message  signals  selected  at  random  in  the  face  of  noise. 

6.  Ordinarily  extraneous  'noise*  in  a  system  is  not  an  important 
factor  in  engineering  considerations.  It  becomes  so: 

a.  When  design  reaches  a  sufficiently  advanced  state  that 
the  essential  'noise'  limitations  restrict  design,  or  rather  restrict  the 
achievable  sensitivity  (examples,  the  sensitivity  limit  of  galvanometer  de¬ 
sign  is  determined  by  Brownian  motion,  or  the  sensitivity  limit  of  kinematic 
linkages  is  generally  the  irreducible  mechanical  friction  in  the  design  type). 

b.  When  the  signal  power  is  quite  small  relative  to  the 

noise  sources.  (Examples  exist  of  many  attempts  to  use  some  very  small 

physical  effect  as  the  basis  for  an  instrument  measure,  when  it  is  generally 
swamped  by  many  large  'error'  sources.  The  concept  of  error  and  of  noise  are 
to  a  considerable  extent  interchangeable.  The  former  comes  from  mechanical 
practice,  the  latter  from  electrical  practice.) 

c.  When  the  available  transferring  or  transmitting  channel 
or  conduit  or  path  is  used  to  carry  more  than  one  flux  to  the  point  that  the 
cumulative  uncertainties  that  separate  these  fluxes  are  an  appreciable  pro¬ 
portion  of  the  fluxes. 

7.  Information  theory  in  the  network  is  most  often  concerned  with 

the  latter,  the  economical  coding  of  one  or  more  messages  in  the  presence  of 

noise  or  error  sources.  It  therefore  has  only  limited  interest  in  the  general 
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physical  limitations  of  systems,  or  in  the  general  design  of  communications 
networks,  or  in  the  message  content,  or  why  messages  are  being  sent  in  the 
first  place. 


8.  Information  theory  may  be  viewed  as  starting  with  Nyquist's 
1924  work  (13)  which  dealt  with  relating  the  transmitting  of  the  maximum 
amount  of  information  to  the  number  of  signalling  elements. 

9.  It  is  desirable  that  a  common  language  be  used  for  the  follow¬ 
ing  exposition  and  discussion. 

In  human  transmission,  letters  (actually  phonemes)  are  organ¬ 
ized  by  meaningful  words  into  messages . 

In  machine  transmission,  signal  elements  are  organized  by 
ordered  arrays  into  messages . 

Signal  elements,  letters,  sending  units,  enunciatable  symbols, 
pulses,  units  are  all  equivalent  terms  or  concept'.'  for  the  intrinsic  elements 
that  the  information  'generator1  can  generate.  For  example,  the  26  word 
alphabet  stc  ns  from  the  twenty  odd  distinct  combinations  that  can  be  formed 
with  the  mouth  by  lip  position,  tongue  position,  and  use  of  voicing  by  the 
vocal  cords.  The  10  symbols  for  a  numerical  alphabet  stems,  roughly,  from 
the  number  of  fingers.  Binary  transmission  signals  stem  from  a  recognizable 
two  state  alphabet  that  the  primitive  electrical  networks  of  telegraphy  could 
use. 


Meaningful  words,  ordered  arrays,  n-tuple  ordered  arrays, 
ordered  sending  arrays,  are  all  equivalent  concepts  for  the  higher  ordered 
information  elements  that  the  information  generator  can  generate  and  that  the 
information  system  can  handle.  These  arrayed  elements  are  scored  in  a 
dictionary  or  code  book. 

Messages  are  higher  ordered  information  elements  made  up  at 
random  out  of  words  as  far  as  the  information  system  and  receiver  are  con¬ 
cerned.  What  is  important  here  ■'.s  the  random  make-up.  If  the  receiver  knows 
the  message,  then  its  elements  are  not  actual  words  but  whatever  meaningful 
cues  were  contained  in  the  message.  These  are  the  real  'words.' 

More  generally,  the  Information  generator  selects  signal 
(letter)  elements  from  its  internal  alphabet  and  encodes  them  into  meaningful 
ordered  (word)  arrays  selected  from  its  internal  dictionary  so  as  to  form 
finite  ordered  (message)  arrays  consistent  with  its  internal  repertoire. 

While  this  choice  of  conceptual  language  may  not  be  perfect, 
or  in  strict  accord  with  current  information  theory  usages,  it  will  be  con¬ 
venient  to  bridge  most  gaps  from  human  communication  to  machine  communication 
to  the  brain. 


10.  Nyqulst  (13)  treated  two  problems  -  the  optimal  form  of  wave 
shape  of  a  signalling  element  In  a  transmission  network  for  greatest  speed  with 
adequate  separation  from  other  signal  elements;  and  optimal  choice  of  code  to 
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transmit  the  maximum  information  with  a  given  set  of  signal  elements.  The 
first  problem  is  detailed  and  technical.  It  states  that  a  simple  pulse  does 
not  remain  a  simple  pulse  after  passing  through  a  network.  Thus  if  what  is 
wanted  in  the  output  is  a  simple  pulse  -  because  of  its  excellent  separation 
characteristics  -  then  one  should  take  into  account  the  pulse  form  deformation 
of  the  specific  types  of  networks  (telegraphic,  radio  and  carrier  circuits, 
land  lines,  submarine  cables).  By  treating  an  Inverse  transformation  problem, 
the  best  forms  are  estimated.  Typically  it  is  not  a  rectangular  pulse  or  a 
half-sine  pulse,  but  a  small  wave  train  with  a  considerable  central  pulse¬ 
like  nature.  These  details  are  not  of  great  concern  in  the  present  discussion. 
(They  are  of  concern  to  circuit  designers.) 

The  second  problem  is  concerned  with  the  choice  of  number  of 
signal  elements.  Minimally  two  are  required,  and  though  it  may  be  desirable 
to  use  more  than  two  'current  values,'  i.e.,  signal  elements,  there  may  also 
be  limitations. 

(We  will  ask  the  reader  to  take  note  of  a  serious  dialectic 
argument  that  develops  here.  Nyquist,  validly,  was  arguing  out  the  case  of 
electrical  signalling  from  the  level  of  problems  of  concern  to  a  telephone 
company.  Thus  the  problem  status  for  telegraphy,  and  multiplexing  of  messages; 
for  radio,  with  noise  and  fading;  for  submarine  cables,  with  signal  speed  limi¬ 
tations,  and  the  like,  are  of  concern  to  him.  A  dot,  dash,  and  silence  were 
the  elements  that  were  viewed.  A  'language'  with  very  few  letters  was  on  his 
mind.  At  another  extreme,  from  whence  we  came,  there  existed  a  well  developed 
art  in  instrumentation  in  which  an  'instrument'  might  deliver  a  well  defined 
'alphabet'  of  a  hundred  or  more  steps.  The  interrelation  and  conflicts  of  in¬ 
formation  theory  and  measurement  theory  -  metrology  -  will  have  to  be  consid¬ 
ered  at  some  time.) 

Nyquist  stated  wnat  may  be  best  described  as: 

The  encoding  Theorem 

*  no.  of’  signal  1  i a;-;  elements  (typically,  the  number  of 
machine  'letters'  such  a two  states), 

■  no.  of  signalling  elements  per  'character'  o  'letter'  used 
by  the  information  generator  ('length'  of  the  ordered  array. 
This  typically  is  the  'letter'  of  the  generator  and  the  'word' 
of  the  transmitter,  e.g.,  the  act  of  encoding  is  to  change 
human  letters  from  its  alphabet  into  machine  words  from  its 
dictionary.  In  telegraphy,  this  typically  might  be  5). 

■  total  no.  of  'characters'  constructable  (e.g.,  the  numbt r  of 
'letters'  in  the  human  alphabet,  becomes  the  total  'dictionary' 
of  the  machine.  This  typically  might  be  32). 

then 


If  s 

n 


sn  •  N. 
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Restated:  A  code  using  n  ’places,'  i.e.,  of  ’length'  n,  with 
s  different  signalling  elements,  can  represent  a  dictionary  of  N  'dictionary 
words . ' 


If 


Transmission  'Capacity'  Equation 
m  *  no.  of  signalling  elements  transmitted,  and 
M  •  no.  of  'character??'  transmitted, 


then 


m  *  nM 

Restated:  A  code  of  'length'  n  requires  m  signalling  elements 
to  transmit  M  characters. 

From  these  two  very  rudimentary  thoughts  one  may  obtain 


dM  dm 

nr—*  — 
dt  at 

Restated:  To  transmit  a  given  number  of  characters  per  unit 
time  dM/dt  with  a  code  of  length  n  requires  the  transmission  of  a  larger  num¬ 
ber  of  signalling  elements  dm/dt. 


dM  1  dm 

dt  n  dt 


[■  dm 

I  dt  _ 

L  loSbN« 


i°gbs. 


This  is  Nyquist's  formula. 


Restated:  Assuming  that  a  certain  number  of  signalling  ele¬ 
ments  per  unit  time,  dm/dt,  can  be  satisfactorily  transmitted  with  adequate 
separation  (i.e.,  from  other  signals,  and  from  other  frequency  bands),  and 
that  a  fixed  'alphabet'  with  N  letters  is  drawn  from,  then  the  rate  at  which 
characters  can  be  transmitted  dM/dt  is  proportional  to  the  logarithm  of  the 
number  of  signalling  elements  used. 


b  »  logarithmic  base  used. 

Thus,  Nyqulst  was  concerned  with  the  designer's  problem  much 
more  than  the  information  theory  result.  He  argues  that  there  is  advantage  in 
going  to  more  than  two  current  values  (sending  units)  in  transmitting  intel¬ 
ligence.  However,  the  practical  advantage  is  in  a  moderate  Increase  in  num¬ 
ber,  not  a  large  number.  (He  shows  that  an  estimation  of  the  transmission 
capacity  will  not  agree  exactly  w< th  the  formula,  for  codes  that  are  not  com¬ 
pletely  elementary.  On  the  other  hand,  a  printer  code,  of  characters  of  equal 
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duration,  agrees  quite  closely.  However  these  details  are  not  pertinent  for 
present  purposes.)  For  example,  he  points  out  a  full  two-fold  gain  in  a  3- 
current-value  continental  code  over  a  2-current-value  continental  Morse  code. 
However  there  are  the  following  limitations  in  codes  with  more  than  2  current 
values. 


a.  If.  is  ruled  out  whenever  'telegraphic'  circuits  are  cheap, 
so  that  the  2-current  code  is  most  often  the  most  economical; 

b.  the  absolute  amplitude  fluctuations  do  not  permit  resolu¬ 
tion  of  the  sending  units; 

c.  resolution  is  limited  by  noise  interference; 

d.  besides  interference  and  fluctuations  in  transmission  ef¬ 
ficiency,  there  are  power  limitations  which  determine  the  maximum  number  of 
current  values.  (He  is  ambiguous,  but  his  examples  suggest  he  Is  talking  about 
the  ratio  of  received  signal  power  to  received  interference  power  as  limiting 
the  current  values.) 

Thus,  it  appears  that  Nyquist  accepts  the  binary  telegraph 
system  as  the  foundation  of  information  transmission. 

(This  avoids  the  body  of  knowledge  in  metrology  and  instru¬ 
mentation.  If  we  have  an  instrument  that  has  a  recognizable  'alphabet'  of 
1000  states  -  for  example,  an  altimeter  that  can  be  read  as  25,320  feet  to  the 
nearest  scale  division  •  and  a  human  coding  and  transmission  system  that  can 
transmit  these  numbers  'almost'  as  fast  and  reliably  as  binary  numbers,  then 
we  are  not  going  to  transmit  by  binary  numbers  but  by  human  unit  numbers, 
which  can  decide  to  use  as  many  signalling  elements  as  the  scale  sensitivity 
of  the  instrument  will  permit,  in  the  example,  4,998  signalling  elements  for 
a  50,000  foot  altimeter.  However,  even  beyond  this  we  have  been  taught  and 
teach  what  would  now  popularly  be  regarded  as  a  mixed  analogue-digital  system, 
that  a  scale  division  can  be  estimated  reasonably  by  eye  to  l/20th  of  a  scale 
division  and  that  by  estimating  to  l/30-l/50th  of  a  scale  division,  reliabil¬ 
ity  to  l/20th  can  be  assured.  This  has  been  known  and  available  in  instrument 
literature  since  the  end  of  the  last  century.  Thus  we  could  read  34,145.5 
feet  with  a  reliability  of  1  foot  with  little  extra  time  required,  since  we 
actually  may  have  50,000  signalling  elements  available  from  a  50,000  foot 
altimeter. 


The  issue  is  not  to  quarrel  with  Nyquist's  formulas,  but  to 
point  out  their  limited  and  limiting  application.  We  agree  on  the  basis  of 
experience,  that  it  is  ultimately  the  tot*,  social  'cost'  that  governs  the 
number  of  signalling  elements  that  are  used.  Electrical  engineers  have  re¬ 
garded  binary  codes  as  cheapest  and  have  thus  directed  information  theory, 

e.g.,  in  the  same  style  as  Hegel's  justification  of  the  Prussian  state.  Such 
remarks  are  offered  in  the  Interests  of  forcing  a  deeper  seated  examination  of 
tills  field. 

The  binary  code  was  accepted  into  telegraphy  because  of  Nyquist's 
second  reason,  namely  in  poor  quality  transmission,  with  a  signal  of  meaning  less 
amplitude,  the  only  two  state*  -  of  a  linear  measure  -  that  could  be  identified 
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was  zero  and  one.  One  was  anything  that  was  not  zero.  You  cannot  convince 
Instrument  technologists  who  have  taken  on  the  problem  of  distinguishing 
measure  states  at  all  levels  from  one  part  In  two  to  one  part  in  10^“ 9-10 
that  all  of  these  problems  do  not  lie  in  the  usable  Information  arts.  In  all 
cases  the  problem  is  how  to  transform  the  measure  problem  Into  one  which  the 
human  encoder,  storage,  retrieval,  and  transmission  system  can  deal  with. 

Now  we  will  grant  Nyquist's  formula,  and  that  a  number  of 
signalling  elements  for  the  human  dm/dt  changes  with  its  complexity.  However, 
our  metrological  stcck-in-trade  is  to  choose  that  Information  rate  which  suits 
the  overall  problem.  Typically  our  most  rapid  transmission  problem  is  oper¬ 
ated,  quite  efficiently,  wlch  the  following  parameters: 

s  ■  50  (letters,  numbers,  some  added  symbols) 

N  ■  50  (the  number  of  'characters') 

n  *  1  (one  signalling  element  -  namely,  one  'grunt'  per 

character  permits  nice  calm  discrimination) 

dM/dt  ■  1-2  characters  per  second  (the  faster  rate  is  brutal  to 

maintain;  the  first  is  only  difficult) 

l.e.,  basically  we  like  to  transmit  at 

dM  m  dm 
dt  "  dt 

with  a  large  number  of  states  s. 

If  a  binary  system  is  to  be  used,  it  can  transmit  information 
at  the  same  rate,  but  it  will  have  to  do  it  as  follows: 


b  -  2 

N  -  50 

s  ■  2 

dm 

dM  b  dt 
dt  "  logj  50 


If  the  system  will  transmit  dm/dt  »  (1-2)  log2  50,  or  about  6 
' oinary  digits’  per  second  tnen  it  can  handle  the  human  transmlsalon  system. 
Since  this  is  easy  to  accomplish,  the  telegraph  system  is  not  the  limitation 


but  the  human.  In  very  similar  fashion,  it  is  not  the  human  that  is  the  infor- 


matlon  limitation,  but  the  measuring  instrument. 


The  second  metrological  principle  ve  have  made  use  of  for  a 
long  time  (it  is  likely  at  least  50  years  old)  is  that  the  limitation  of 
'speed  of  response*  in  a  measure  is  tied  to  the  sensitivity  according  to  the 
following  rule: 
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fraction  of  full  scale  response 


characterizes  its  frequency 


corresponding  to  that  frequency, 
either  as  a  resonant  frequency  or  as  a  response  time  constant). 

This  nominal  'law'  is  not  to  be  derived  from  kinematic  con¬ 
cepts,  as  information  theory  has  thus  far  been,  but  from  dynamic  limi cat ions 
in  the  art  of  building  'sensitive'  instruments. 

The  constant  c  varies  with  the  class  of  measurement,  but  much 
less  than  any  possible  current  theory  would  account  for.  Typically,  a  sensi¬ 
tivity  of  1  part  in  1000  may  require  a  minimal  measurement  time  of  1  second, 

1  part  in  100,000  will  require  10  seconds,  1  part  in  10^,  100  seconds,  etc. 

If  we  couple  this  concept  with  Kelvin's  catch-phrase  in  metrology  "To  measure 
is  to  know,"  then  one  may  start  to  believe  that  the  fields  of  information 
theory  and  metrology  are  connected  in  dealing  with  information  and  knowledge. 

The  essence  of  the  matter  is  that  the  flow  of  information  may 
be  limited  by  the  sender  or  the  transmission  system.  If  you  are  in  the  trans¬ 
mission  business,  this  is  what  interests  you;  but  if  you  are  in  the  'informa¬ 
tion'  business,  it  should  more  likely  be  the  generation  of  information  that 
interests  you.  It  is  likely,  however,  that  what  represents  the  irreducible 
bottleneck  deserves  attention.  Modem  transmission  speed  generally  permits 
so  great  a  rate,  that  the  casual  sender  doesn't  concern  himself  about  the  re¬ 
dundancy  m  'garbage'  in  his  messages,  and  has  helped  develop  the  myth  cf  the 
tremendous  amount  of  information  -  typically  scientific  information  -  that  is 
in  transit.  It  is  only  certain  problems  -  jammed  up  against  the  most  rapid 
current  or  next  generation  computers  -  that  show  that  the  information  process¬ 
ing  channels  can  be  saturated,  and  that  it  pays  to  study  methods  for  removing 
the  garbabe  in  the  'information,’  e.g.,  if  everyone's  Christmas  message  is 
"Hello  Mom,"  you  need  only  the  names  of  the  senders.  It  is  the  irreducible 
minimum  information  in  generators  that  is  the  concern  of  a  physical  theory 
portion  of  information  theory.  At  such  a  point  then,  the  speed  limitations 
of  the  transmission  channel  are  not  of  concern. 

For  example,  it  Is  possible  to  transmit  an  intermittent  code 
of  signalling  elements  -  for  standardising  the  signal  in  the  case  of  amplitude 
variations.  This  procedure  -  known  as  calibration,  or  standardisation  -  is 
characteristic  of  all  measurements,  ani  its  use  in  Instrumentation  is  generally 
so  Infrequent  so  almost  not  to  be  worth  accounting  for  in  the  lnfurmatlou 
transmission  rate.  On  the  other  hand,  those  of  us  with  only  amateur  photo¬ 
graphic  experience  know  how  many  grey  scales  we  have  had  to  prepare  to  keep 


e  ■  sensitivity  (most  often  as 
for  linear  instruments) 

f  *  a  frequency  (a  number  that 
response) 

t  ■  a  time  (roughly  the  period 
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prints  from  a  nondescript  set  of  negatives  within  any  kind  of  appropriate  con¬ 
trast  range.  All  of  these  arguments  and  many  more  must  be  stirred  up  in  the 
framework  of  this  subject,  and  it  is  unfortunate  that  they  haven't  been  stirred 
in  before.  Besides  such  elementary  kinematic  problems  as  the  multiplexing  of 
information  at  source  and  transmitter  there  are  also  the  physical  dynamic 
problems.  Information  theory  must  be  developed  with  a  number  of  limiting  as¬ 
pects  in  mind.) 

11.  A  second  paper  to  be  noted  is  Hartley's  in  1928  (14).  First, 
he  restates  the  encoding  theorem  in  a  somewhat  more  general  form: 

sn  *  N 

s  ■  no.  of  primary  symbols 
n  *  selection  of  primary  symbols 
N  ■  total  no.  of  possible  sequences. 


as: 


Originally,  per  Nyquiat,  one  would  have  viewed  this  relation 


(no.  of  signalling  elements )co<*e  ^en8ch  .  size  of  code  dictionary 


and  considered  this  as  referring  to  code  length  of  letters  and  the  machine 
dictionary  of  letters.  Hartley  likely  viewed  that  the  number  of  primary 
symbols  may  be  considered  fixed  in  operation,  and  that  the  code  length  of 
primary  symbols  increases  as  the  communication  proceeds  (i.e.,  as  the  length 
of  the  total  ordered  array  grows)  so  that  the  information  grows.  The  quantity 
N  is  now  essentially  the  number  of  messages.  (Example  -  26  letter  signalling 
elements,  each  chosen  independently,  for  a  certain  transmission  length . -  say 
13  telegraphed  symbols  -  permits  26“  possible  messages.)  The  quantity  N  was 
regarded  by  Hartley  to  be  a  measure  of  the  information  involved. 


12.  Basically,  Hartley  wanted  information  in  the  selection  process 
to  be  associated  uniquely  with  N,  and  chose  the  parameter  I  "the  amount  of  in¬ 
formation  associated  with  n  selections"  to  be 


I  -  logb  N 

b  -  an  arbitrary  log  base 


Because  of  his  choice  of  a  logarithmic  definition,  in 

I  •  n  logb  s 

he  succeeded  in  endowing  'information'  with  a  number  of  properties  that  he 
wanted,  such  as  proportionality  to  the  number  of  selections,  i.e.,  he  wanted 
information  to  grow  as  the  number  of  selections  increased,  and  to  dapend  only 
on  the  total  nuober  of  possible  symbol  sequences,  i.e.,  only  on  sR. 
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(This  step  is  likely  now  regarded  as  crucial  in  the  'kine¬ 
matic'  theory  of  information.  Each  reader  will  have  to  justify  its  purpose 
in  his  own  mind.  The  key,  from  the  electrical  engineer's  view,  is  that  this 
definition  permits  an  'insertion'  type  concept,  where  particular  information 
can  be  inserted  into  a  long  continuing  array  of  signals  and  be  specifically 
associated  with  the  selection  array  of  that  incremental  message.  However, 
its  mystique  created  the  need  for  further  exposition.  Why  the  delay  from 
1928  to  Shannon,  1948..  for  further  exploration  is  a  subject  for  more  detailed 
historical  research.) 

13.  If  n  “  1,  the  information  associated  with  a  single  selection 
of  primary  symbols  (such  as  2  current  values,  or  26  letters,  etc.)  is  I  ■ 
logb  s.  If  a  character  (a  machine  word)  involves  n  selections  (such  as  5  in 
a  binary  code),  I  *  n  log^  s.  Tints  far  this  is  satisfactory  for  telegraphy. 
However  the  'character'  may  be  secondary.  In  speech,  for  example,  s  may  be 
regarded  as  the  number  of  words.  Thus  the  actual  numerical  value  of  infor¬ 
mation  can  change  from  one  context  to  another,  and  it  will  also  depend  on  the 
logarithmic  Lase.  (Hartley  did  not  write  with  the  greatest  of  clarity.  Yet 
it  is  clear  that  for  any  particular  engineering  application  -  telegraphy,  or 
other  'mechanistic '  tasks  -  one  had  a  useful  user's  measure  to  characterize 
transmission  properties.  However,  the  philosophy  of  'information'  in  a  physi¬ 
cal  sense  or  a  biological  sense  was  really  not  tackled.  A  telephone  company's 
task,  on  the  other  hand,  was.) 

14.  The  encoding  law  and  the  definition  of  information  can  now  be 
used  to  seek  out  the  physical  mechanisms  that  limit  information  transmission. 

It  is  to  be  assumed  that  there  should  be  temporal  independence  (no  confusion) 
in  receiving  signals  which  ware  sent,  by  virtue  of  the  transmission  system  net¬ 
work  characteristics.  (The  encoding  law  refers  to  messages  sent,  not  to  their 
reception.)  Thus,  one  finds  for  the  information  rate 


dl 

dt 


Various  networks  may  be  considered  to  determine  tneir  limitations  on  informa¬ 
tion  rate.  Hartley  finds 

a.  a  charging  time  constant  limitation, 

dl  1 

■it  m  ~K 

b.  a  system  frequency  response  limitation,  using  a  low  pass 

filter  network  as  an  example,  dl/d:  *  (f^  *  cut-off  frequency  cf  the  filter). 

These  limitations  ure  both  imposed  by  the  requirement  of  re- 
solving  a  signal  from  the  following  signal,  in  the  light  of  network 
characteristic* . 

« 

1  15.  The  optimal  information  rate  and  the  optimal  transmission 

*  r*fc  (from  thp  characteristics  of  the  network)  may  not  coincide,  and  informa- 

ti  transformation  may  be  necessary  As  an  example,  signal  modulation  may  be 


necessary  to  fit  a  low  message  rate  to  a  high  transmission  rate  requirement 
(wireless  propagation  can  only  take  place  at  high  frequency).  As  another  ex¬ 
treme  example,  transmission  on  parallel  lines  can  be  used  if  the  transmission 
rate  is  low,  or  there  is  a  time  delayed  storage  of  message  on  'record,'  and 
its  transmission  is  then  effected  at  a  lower  rate. 


In  any  case,  Hartley  has  demonstrated  that  the  amount  of  re¬ 
solvable  information  transmitted  by  a  network  has  the  limitation 

1  "  foT 

fQ  ■  frequency  'band-pass' 
t  *  time  available  for  transmission,  or 

I  *  'wave  number  range'  x  'record  length'  (if  the  information 
is  recorded  in  'space,'  i.e.,  the  'frequency'  and  'space- 
time*  prpduct  in  all  cases.) 

Thus 


dl 

dr 


*o 


dn 

dt 


logb  s- 


Restated:  The  rate  of  'information'  transmission,  which  is 
proportional  to  the  rate  at  which  signalling  elements  are  sent  and  to  the 
logarithm  of  the  number  of  signalling  elements  used,  which  can  be  resolved  by 
passage  through  a  network  is  measured  by  the  cut-off  frequency  or  band-pass 
frequency  of  the  network. 


(One  should  note  that  the  'information'  concept  here  is  a 
purely  kinematic  concept,  and  the  physical  'network'  concept  here  is  a  purely 
linear  network  concept  whose  dynamics  are  replaced  by  only  one  overall  idea, 
the  frequency  band  to  which  the  network  can  respond.  The  statistical  mechan¬ 
ics  of  systems  is  not  invoked.) 


16.  While  the  subject  of  statistical  fluctuations  was  well  rooted 
in  statistical  mechanics,  as  can  be  noted  in  (4),  (5),  and  (6),  the  introduc¬ 
tion  of  the  subject  of  'noise'  into  networks  and  information  theory  likely 
originated  in  the  work  of  Schottky,  and  in  the  Johnson-Nyquist  treatment  of 
thermal  noise.  Moullln  (7)  is  a  suitable  beginning  from  which  to  trace  the 
equivalent  source  concept  of  noise  in  the  network.  For  example,  Nvqulst  gave 
Johnson  noise  in  a  resistor  4  KT  df  as  the  noise  power  generated  and  distri¬ 
buted  uniformly  in  the  frequency  band,  df,  due  to  temperature  T,  where  k  is 
Planck's  constant.  He  further  gave  the  current  appearing  in  the  output  due  to 
the  transform  of  the  network. 


Rice's  papers  (8)  carry  out  In  considerable  detail,  the  theory 
of  r.olse  In  networks  from  a  number  of  sources  His  main  concern  is  with  the 

statistics'  properties  of  noise  in  the  output.  He  introduces  the  concept  of 
analysis  by  the  techniques  of  power  spectra  and  correlation.  This  has  become 
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popularized  among  modern  engineers  through  the  text  of  Blackman  and  Tukey, 

THE  MEASUREMENT  OF  POWER  SPECTRA  (Dover,  1958).  Rice  offers  as  his  source 
"The  correlation  function  ...  apparently  was  introduced  by  G.I.  Taylor" 

(1920).  "Recently  it  has  been  used  by  quite  a  few  writers  in  the  mathemati¬ 
cal  theory  of  turbulence"  (Goldstein  -  MODERN  DEVELOPMENTS  IN  FLUID  DYNAMICS, 
Oxford,  1938). 

(Very  validly,  one  may  view  Rice's  article  as  indication  that 
the  bridge  from  statistical  mechanics  to  the  analysis  of  noise  in  the  network 
had  been  well-constructed  and  in  process  of  becoming  a  working  tool  in  the 
field.  Similarly  Chandrasekhar 's  paper  (6)  did  broadcast  that  a  well  devel¬ 
oped  art  existed  for  treating  stochastic  problems.  It  commonly  comes  as  a 
surprise  to  many  specialists  in  this  field  that  others  outside  the  field  seem 
to  have  some  familiarity  with  the  problems.  We  can  cite,  from  our  own  per¬ 
sonal  background  and  experience,  the  techniques  of  the  statistical  theory  of 
turbulence  had  been  widely  discussed  and  disseminated  in  hydrodynamics  and 
fluid  mechanics.  Thus,  just  as  Wiener  had  to  defond  himself  on  the  relation 
of  his  work  to  Kolmogoroff 's  on  time  series  stating  that  "...  the  study  of  the 
...  problem  was  the  next  thing  on  the  agenda,"  we  believe  the  study  of  uncer¬ 
tainty,  error,  and  noise  was  timely  for  the  scientific  agenda  in  the  30's  and 
40's.) 


As  the  publication  of  Lawson  and  Uhlenbeck  (9)  indicates,  a 
large  literature  on  signal  and  noise  in  networks,  its  relation  to  statistical 
mechanics,  and  the  abstraction  of  information  from  networks  had  already  come 
into  existence  by  1950.  We  will  not  pursue  this  direction.  It  is  sufficient 
to  point  to  such  sources  as  Khinchine  (22)  or  Brillouin  (23)  for  the  broader 
physical-philosophic  connections  with  statistical  mechanics. 

17.  It  is  widely  regarded  that  Shannon's  1948  papers  begins  the 
modern  communications  engineering  theory  of  information.  In  the  introduction 
to  that  paper  it  was  stated:  "The  recent  development  of  various  methods  of 
modulation  ...  which  exchange  bandwidth  for  signal-to-noise  ratio  has  intensi¬ 
fied  the  interest  in  a  general  theory  of  consnunication.  A  basis  ...  is  con¬ 
tained  in  the  ...  papers  of  Nyqulst  and  Hartley  ...  In  the  present  paper  we 
will  extend  the  theory  to  include  ...  new  factors,  in  particular  ...  noise  in 
the  channel,  and  the  savings  ...  due  to  the  statistical  structure  of  the 
original  message  ...  and  the  nature  of  the  ...  destination  of  the  information." 
(It  is  clear  that  Shannon's  concern  was  mainly  with  transmission  of  words  or 
pictures  over  electrical  transmission  systems  -  the  Bell  Labs  problem.) 

While  there  is  a  semantic  aspect  to  communications,  the  engi¬ 
neering  problem  is  the  faithful  transmission  of  one  message  selected  from  a 
large  but  finite  set  of  messages  from  one  point  to  another  through  a  trans¬ 
forming  network.  Any  monotonic  function  of  the  number  of  possible  messages 
(i.e.,  as  given  by  the  encoding  theorem)  is  a  measure  of  information,  but 
Hartley’s  logarithmic  function  is  a  natural  and  convenient  choice,  although  it 
will  require  generalization.  The  choice  of  a  base  corresponds  to  choosing  a 
unit.  If  the  base  Is  2,  the  units  may  be  called  binary  digits,  or  per  Tukey, 
bits;  If  base  10,  then  decimal  digits,  etc.  A  two  position  switch  stores  one 
bit,  a  digit  wheel  stores  one  decimal  digit. 
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A  communications  system  may  be  regarded  as  a  chain  of  five 
components  -  an  information  source  generating  some  function  of  time,  a  trans¬ 
mitter  that  transforms  the  function  of  time  message  into  a  signal  that  can  be 
transmitted  over  a  channel  (ambiguous  -  it  is  not  clear  whether  he  means  the 
network)  through  which  the  signal  is  transmitted,  a  receiver  that  reconstructs 
the  message  from  the  signal,  the  destination  for  whom  or  which  the  message  is 
intended.  The  signal  in  the  channel  may  be  perturbed  by  noise.  Communica¬ 
tion  systems  may  either  be  discrete  (the  message  and  signal  are  discrete  sym¬ 
bols  -  in  telegraphy,  the  message  is  a  letter  sequence,  the  signal  is  a  dot- 
dash-space  sequence);  continuous  (the  message  and  signal  are  both  continuous 
functions  -  e.g.,  radio  or  television);  mixed  (both  discrete  and  continuous 
variables  appear).  The  theory  of  the  discrete  case  is  a  foundation  for  the 
others . 


18.  Shannon  starts  with  Hartley's  definition  of  the  information 
in  an  encoded  message  (modified  to  take  into  account  varying  lengths  for  dif¬ 
ferent  signalling,  elements  such  as  dot-space,  dash-space,  letter-space,  and 
word-space  -  however  these  details  are  not  of  present  concern). 


dl 

dt 


£ 


dt  logb  "<*> 


N  ■  no.  of  signals  allowed  in  the  time  t 


dl/dt  *  information  capacity  of  the  channel  in  the  presence  of 
the  discrete  signals  and  no  noise. 


For  example  -  typically  -  base  2  will  be  used,  so  that  the 
capacity  may  be  specified  as  the  number  of  binary  digits  -  bits  - 
per  second  required  to  specify  the  particular  signal  used. 


19.  However,  he  now  wishes  to  consider  the  characteristics  of  the 
information  source.  He  will  regard  that  the  transmission  of  information  as 
messages  in  the  English  language  is  a  typical  problem.  (One  will  note  that 
he  has  not  defined  information  as  a  human  using  English  now,  but  the  retro¬ 
spective  problem  of  what  are  the  statistical  properties  of  the  class  of  past 
messages  in  English.  The  problem  is  certainly  valid  as  a  Bell  Labs  problem, 
and  some  insight  into  the  kinematics  of  information.  It  does  not  deal  with 
the  dynamic  problem  of  the  information  source.  This  more  subtle  distinction 
will  come  into  fuller  focus  as  this  report  develops.) 


Shannon  now  points  out  that  the  information  system  does  not 
generate  messages,  say  from  English  letters,  as  26  choices  x  26  choices  x  26 
choices,  etc.,  but  with  probabilities  associated  with  various  types  of  chains 
of  sequences.  Thus  there  are  other  stochastic  processes  than  just  a  simple 
equlprobabllity  distribution.  Examples  are  given  to  illustrate  stochastic 
'language'  messages  constructed  from  s  lowest  zero-order  approximation  (inde¬ 
pendent  equiprobable  symbols),  to  those  possessing  the  probabilities  of  two 
or  more  letter  chains  ss  used  in  English,  to  even  greater  complexity.  The 
problem  description  la  identified  ss  lying  vfthin  the  field  of  Markov  proc¬ 
esses.  (As  part  of  s  stochastic  model  of  language,  In  1913  Markov  examined 
20,000  letters  in  Pushkin's  novel  EUGENE  ONEGIN  in  developing  s  theory  of 
c  .ms  of  symbols.) 
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Stemming  from  the  similarity  of  a  message  'space'  to  the  phase 
space  of  statistical  mechanics  which  has  been  embedded  in  Gibbs'  concept  of 
an  ergodic  process,  this  formalism  is  applied  to  information  theory.  (An 
ergodic  process  is  one  whose  statistical  properties  in  a  phase  space  in  which 
all  possible  states  of  a  system  are  shown  is  representative  of  the  course  of 
change  of  any  particular  system  in  time,  i.e.,  the  averages  over  all  systems 
in  phase  space  agrees  with  the  averages  of  any  system  in  time.)  In  an  ergodic 
process  every  sequence  produced  by  the  process  -  if  long  enough  -  is  the  same 
in  statistical  properties,  i.e.,  it  implies  statistical  homogeneity. 

Thus  different  from  Hartley;  who  viewed  information  as  assoc¬ 
iated  with  all  possible  sequences,  Shannon  is  concerned  only  with  those  se¬ 
quences  that  satisfy  equilibrium  constraints.  How  much  information  'choice' 
is  then  involved? 

Shannon's  approach  was  to  seek  a  'restriction'  on  the  amount 
of  information  by  weighing  the  choices  in  accordance  with  their  probabilities. 

Thus,  suppose  we  can  recognize  n  chain  'views,'  or  'states'  of 
a  message  process  such  that  their  probabilities  are  disjoint  and  summable  to 
unity.  Let  us  define  the  probabilities  associated  with  these  states  by 
Pl»  P2»  •••  Pn  2Pl  “  1 •  Shannon  proposes  as  a  measure  of  information  produced 
in  such  a  process  that 


where 


n 

H  -  -  k  £  pt  logb  pi 

i-1 


k  ■  a  constant 

H  =  a  measure  of  information  content. 

(Shannon  takes  k  =  1,  if  b  =  2.) 

If  the  probabilities  are  equal,  i.e.,  p^  ®  1/n,  then  H  =  K 
logb  n,  which  is  the  Hartley  result,  if  n  is  regarded  as  the  number  of  all  of 
the  "events"  that  may  take  place,  where  the  "events"  may  lie  at  such  extremes 
as  the  number  of  independent  symbols  or  the  number  of  independent  complete 
messages.  This  measure  H  is  regarded  as  the  "entropy"  of  the  set  of 
probabilities . 


(It  Is  obvious  from  Shannon's  references  -  Tolman  -  and  lan¬ 
guage  -  ergodic,  entropy,  etc.  -  that  he  was  guided  by  the  statistical  mechan¬ 
ical  derivation  of  the  equilibrium  state  of  an  ensemble  of  'atoms'  in  a  phase 
9pace  due  to  equipartition.  It  is  irstructive  to  note  the  minimum  ideas  that 
make  up  the  statistical  mechanical  a-gument. 

A  'molecule'  with  f  degrees  of  freedom  may  be  represented  as  a 
point  in  a  ph*pe  space  of  2f  generalized  coordinates  and  momenta  -  such  as  6 
dimensions  for  a  monatomic  molecule.  A  system  of  N  molecules  can  be  represented 
as  a  point  in  a  2fN  hyperspace,  or  as  a  distribution  of  points  in  an  f  space. 
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The  temporal  motion  of  this  point  in  hyperspace  -  its  trajectory  -  is  de¬ 
scribed  by  Newton’s  laws  of  motion.  If  one  considers  all  such  systems, 
subject  to  certain  constraints,  such  as  constant  large  number,  and  constant 
total  energy,  then  such  canonical  systems  have  tho  ergodic  property  that  at 
equilibrium,  the  equilibrium  properties  which  are  time  averages  over  the 
trajectory,  coincide  with  the  space  averages  over  the  ensemble  in  phase  space 
Our  first  concern  is  the  equilibrium  distribution  of  states  in  phase  space, 
for  this  then  also  indicates  the  'usual'  near-equilibrium  states  in  time. 

Since  the  molecular  distribution  in  phase  space  is  not  ex¬ 
pected  to  have  a  scale,  until  one  gets  down  to  uncertainty,  or  fluctuation 
limitations,  one  can  arbitrarily  divide  the  phase  space  into  a  large  number 
of  equal  small  cells,  denumerable  as  1  ....  i.  In  each  cell  there  will  be 
a  number  of  molecules  that  can  be  assumed  to  be  large,  i.e.,  it  is  assumed 
that  the  distribution  of  states  is  large  enough  to  be  regarded  as  nearly 'con¬ 
tinuous.'  Let  nj  be  the  number  of  molecules  in  the  Jth  cell.  Then  the  num¬ 
ber  of  distributions  M  of  molecules  in  phase  space  is  given  by 


M  -  - — - 

(nL ! )  (n2!)  ...  (nt I ) 

since  the  number  of  possible  arrangements  for  the  distribution  n  ...  n^  is 
the  number  of  combinations  of  N  things  taken  n^,  ...  at  a  time. 

Taking  the  log  of  both  sides 

In  M  "  In  N!  -  In  •  In  n2!  •  ...  -  In  n^ 

and  using  Stirling's  approximation  for  large  factorial  numbers 

In  M  •  N  In  N  -  n^  In  n^  -  n2  In  n2  -  ...  -  n^  In  n^ 

This  step  produces  the  N  In  N  term  that  Shannon  was  seeking.  Completing  the 
statistical  mechanical  argument,  we  have  also  the  constraints 


Hoj  -  N 
n  j  ■  E 

■  energy  of  a  molecule  in  the  jth  cell 
E  •  total  energy 

It  is  required  that  the  number  of  distributions  be  a  maximum 
for  the  equilibrium  distribution  of  N  molecules.  Thus 

d  In  M  ■  0  ■  -  £  In  nj  dnj 
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0 


ZdHj 

]£<j  dnj  ■  0 

Multiplying  the  second,  equation  by  a  and  the  third  by  0  -  Lagrangian  multi¬ 
pliers  -  and  adding  to  the  first, 

J](ln  nj  +  a  +  0«j)dnj  -  0 

so  that  for  any  j 


In  nj  +  a  +  /?  « j  -  0 


is  the  distribution  of  molecules  of  equilibrium  in  each  cell,  or  the  probabil¬ 
ity  of  Pj  ■  nj/N  is  given  by 


Replaced  by  its  continuous  expression 

^  -  Ce-^‘  dq^  ...  dqf  dp,  ...  dp^ 

is  the  Maxwel 1-Bolt zmann  distribution  of  molecules  in  a  phase  space  of  f  gen¬ 
eralized  degrees  of  freedom  with  coordinates  and  p^.  The  remainder  of  the 
statistical  mechanical  arguments  do  not  concern  as. 

This  is  likely  the  structure  that  guided  Shannon.  The  name 
’entropy,’  or  'information'  for  the  quantity  p^  log  p,  was  a  convenience  - 
and  that  is  all  -  and  it  is  not  to  be  taken  too  seriously.  This  mathematical 
statement  and  its  assumptions  as  Shannon  points  out,  "are  in  no  way  necessary 
for  the  present  theory.  It  is  given  ...  to  lend  ...  plausibility  to  ...  later 
definitions.  The  real  justification  of  those  definitions  ...  will  reside  in 
their  implications.") 

Now  guided  by  the  statistical  mechanical  result,  Shannon  points 
out  that  the  information  function  H,  'Shannon's  entropy,'  has  properties  of  in¬ 
terest  to  him  from  ar  information  point  of  view.  If  all  the  p's  but  one  are 
zero,  so  that  the  remaining  one  is  unity,  H  has  the  value  0,  l.e.,  no  informa¬ 
tion  because  the  outcome  is  known.  (All  the  'messages'  are  A,  A,  A,  ....  or 
Hello,  mom.')  H  will  have,  and  can  be  a  maximum  when  all  the  p's  are  equal  and 
eoual  to  1/n,  so  that  H  ■  +  K  logb  n,  the  Hartley  result. 

(We  now  come  close  to  the  heart  of  the  matter  as  far  as  It 
concerns  Shannon.  In  so  doing  we  are  providing  an  interpretation  of  Shannon's 
views,  which  may  not  be  correct.  However,  in  taking  this  step  we  can  bring  up 
a  substantive  issue  that  la  disturbing. 
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Shannon  does  not  make  explicitly  clear,  nor  did  Hr.rtley,  what 
is  the  total  generalization  that  is  wanted  for  the  content  of  a  'message.'  It 
is  equally  clear  -  in  quickly  reviewing  a  half  dozen  statistical  mechanical 
books  -  that  the  statistical  mechanical  discussions  also  tend  to  be  somewhat 
confusing.  One  is  permitted  to  select  for  the  ensemble  individual  molecules, 
a  collection  of  molecules,  all  similar  collections  of  such  molecules,  etc.,  as 
representing  different  concrete  systems  that  may  be  ambiguously  discussed. 
Similarly,  in  messages  we  are  talking  about  ar.  ambiguous  collection,  even  if 
we  said  English  messages.  We  may  use  signalling  elements  to  denote  letters, 
words,  abbreviations,  phrases,  messages,  etc.  We  believe  that  Shannon  consid¬ 
ered  all  of  these  possibilities,  i.e.,  all  of  the  possibilities  that  may  be 
used  by  telephone  companies,  etc.  Thus  the  assignment  of  the  probability  of 
occurrence  of  a  chained  element,  i.e.,  of  a  Markov  cha*n,  was  not  an  a  priori 
assignable  step,  but  one  to  be  discovered  by  experience,  namely  from  a  large 
collection  of  past  messages.  However  these  chains  would  not  all  be  alike  - 
they  might  mix  apples  and  oranges,  i.e.,  they  represent,  most  closely,  that 
sequence  of  signal  elements  that  a  skilled  shorthand  writer  might  develop  as 
a  personal  code.  However  in  order  to  assess  the  'information  content'  of  a 
series  of  probability  of  occurrence  of  these  various  elements,  as  we  have 
stressed,  the  choice  of  probabilities  must  be  disjoint.  This  is  no  longer 
physics,  but  mathematics.  This  doesn't  sink  the  concept,  but  it  makes  if  dif¬ 
ficult  to  apply  physical  law  -  such  as  Newton's  laws  -  to  the  argument  to 
justify  principles.  The  result  to  be  obtained  is  purely  kinematic,  i.e., 
involving  space  and  time.  Dynamic  elements  can  only  enter  into  the  physical 
transmission  network. 

Now  the  chain  of  disjoint  elements,  made  up  of  such  diverse 
subject  matter  as  i  before  e,  two  spaces  can't  come  together,  e  is  the  most 
common  letter  in  English,  'the'  is  the  most  common  word  in  English,  complex  or 
long  company  names  can  be  abbreviated  and  coded,  the  cliches  of  language  per¬ 
mit  stock  phrases,  English  has  a  certain  level  of  redundancy,  etc.  can  only 
be  discovered  by  a  Bayesian  logic.  Propose  some  probability  distribution  and 
test  it  to  see  if  it  works  economically.  This  is  what  Shannon  was  trying  to 
get  at.  The  invoking  of  the  concept  of  'Shannon's  entropy'  was  a  reminder  - 
or  s  demonstration  -  that  to  get  the  most  information  encoded,  pursuing  Hartley's 
definition  of  information  content,  required  the  kind  of  distribution  of  ele¬ 
ments  in  a  message  phase  space  like  the  Maxwell -Boltzmann  distribution.  Spe¬ 
cifically,  for  a  given  number  of  cells  in  the  message  phase  space,  the  highest 
amount  of  'Shannon's  entropy,'  information,  would  exist  if  the  probabilities 
in  the  various  cells  were  equal. 

However,  we  don't  understand  the  assignment  yet  -  except  by 
practical  testing.  We  would  suppose  that  one  chooses  something  like  a  binary 
code  signalling  element,  and  a  six  place  ordered  (letter)  array,  so  that  a  64 
cell  dictionary  is  available  for  'messages.'  The  problem  Is  to  choose  chat 
'dictionary*  that  is  most  neany  used  'equiprobably '  in  space  or  time;  that 
such  a  dictionary  assignment  can  only  be  made  experimentally  by  cut  and  try 
to  determine  its  actual  experience;  and  that  at  some  later  time  one  might  ox- 
amine  whether  a  seven  place  letter  array  might  not  produce  greater  speeds  than 
all  of  the  six  place  arrays  tested.) 
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20.  Suppose  there  is  a  long  message  of  N  symbols  (a  symbol  is  what 
is  represented  by  the  ordered  array  from  the  machine  'dictionary.'  It  will 
correspond  to  the  number  of  molecules  in  the  statistical  mechanical  system), 
and  that  there  are  n  symbols  (the  'words'  in  the  dictionary;  or  the  cellb  in 
the  statistical  mechanical  phase  space).  Let  p^  be  the  probability  of  occur¬ 
rence  of  the  ith  symbol.  In  a  long  message,  the  probability  of  occurrence  p 
of  any  particular  message  will  be 


P 


PtN  p2N  PnN 

P1  p2  •**  pn 


the  factor  p^  representing  the  probability  of  the  ith  symbol,  the  exponent 
p^N  representing  nearly  the  number  of  occurrences  of  the  ith  symbol,  and  th.e 
product  of  factors  representing  their  independence.  Then 


or 


In  p 


N^Pi  In  pt 
-  NH 


H 


~  IslILs. 

N 


or  'Shannon's  entropy,'  the  incremental  information  of  a  long  message  sequence 
of  N  symbols  drawn  from  n  exclusive  symbols  in  a  code  book  (a  'dictionary')  is 
the  negative  log  of  the  probability  of  any  particular  long  message  sequence 
divided  by  the  number  of  symbols  in  the  sequence. 

21.  Since  the  actual  probabilities  with  a  given  code  book  ('alpha¬ 
bet,'  or  'dictionary')  for  a  given  message  source  may  not  provide  equiprobable 
maximum  'entropy'  messages.  Shannon  defines  the  1  relative  entropy'  as  the  ratio 
of  H  to  the  maximum  value  it  could  have  with  that  'alphabet.'  One  minus  the 
relative  entropy  is  the  redundancy.  For  example,  using  the  English  alphabet 
and  English  messages,  the  redundancy  is  about  507,.  (This  means  approximately 
that 
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or  supposing  that  some  letters  are  equiprobable  and  the  others  have  zero  prob¬ 
ability,  then  0.5  lr.  26  ■  In  n.  This  represents  a  need  for  approximately  5 
letters.  However  this  probability  distribution  is  far  from  reality,  for  as 
Shannon  points  out  one  can  delete  i3  letters  in  English. 

(This  concept  would  seem  parochial  since  It  requires  a  compari¬ 
son  of  content  for  the  same  transmitting  alphabet,  Just  encoded  differently. 
Shannon's  remark  describing  the  relative  entropy  does  not  help;  "This,  as  will 
appear  later,  ie  the  maximum  compression  possible  when  we  encode  Into  the  same 
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alphabet."  However  Khinchine  (47 )  it  is  clear  that  the  concept  of  re¬ 

dundancy  and  compression,  while  dealing  with  the  same  alphabetical  language 
is  encoded  differently.  Though  "each  sequence  from  one  text  at  hand  is  coded 
into  the  same  alphabet,"  the  rules  of  coding  will  require  "that  different  se¬ 
quences  of  uncoded  text  must  be  coded  differently,"  i.e.,  "by  using  as  short 
a  coding  as  possible  for  the  most  commonly  encountered  sequences  ..."  Thus 
one  crucial  ultimate  step  is  the  encoding  of  a  composite  dictionary  of  let¬ 
ters,  words,  phrases  by  probabilities  of  occurrences  into  a  dictionary  of 
letters,  words,  phrases  using  the  same  letters  but  coded  into  sequences  which 
are  as  short  as  possible  for  the  more  common  sequences,  and  relatively  longer 
for  the  less  common.  For  example,  the  few  hundred  thousand  words  that  make 
up  che  English  language  could  be  coded  by  a  four  place  'word,'  of  which  CTEV 
would  be  typical.  The  dictionary  could  represent  a  'translation'  from  Eng¬ 
lish  letter-word-phrase-message-book  probability  sequence  to  English  letter 
code  1- 2-3-4-5-etc.  'word'  sequence,  i.e.,  a  one  letter  'word'  is  a  letter,  a 
two  letter  'word'  may  stand  for  instructions,  a  three  letter  'word'  may  stand 
for  common  messages,  a  four  letter  'word'  may  stand  for  all  the  words  in  the 
English  language,  etc.  One  has  an  uneasy  feeling  that  most  of  these  questions 
have  been  faced  in  the  past  by  linguists  and  in  crypto-analysis.  However,  we 
will  go  along  and  attempt  to  'discover'  what  is  known.) 

22.  The  operations  performed  in  encoding  and  decoding  discrete 
information  can  be  described  basically  by  the  properties  of  switch  networks, 
viewed  as  two  port  (four  terminal)  networks,  with  internal  switch  states 
viewed  as  memory.  According  to  Shannon,  the  transmitter  encodes  information 
from  the  information  source  in  an  internal  linkage,  a  'transducer.'  (In  in¬ 
strument  parlance,  we  have  been  willing  to  start  from  the  electrical  concept 
of  a  transformer,  and  generalize  it  to  a  device  that  transforms  one  physical 
quantity  into  a  like  physical  quantity.  We  have  accepted  the  concept  of  a 
transducer  as  one  that  changes  one  physical  quantity  into  another  physical 
quantity.  Shannon's  use  of  transducer  is  much  more  specialized.  It  is  likely 
what  may  have  been  considered  a  transponder  in  electricity  He  states  thet 
its  input  is  a  sequence  of  input  symbols  and  its  output  a  sequence  of  output 
symbols.  However,  it  may  have  internal  memory  so  that  its  output  depends  on 
its  past  history  as  well  as  the  present  output  state.)  Shannon's  informational 
'entropy'  may  be  conserved  from  input  to  output,  or  at  most,  some  may  be  lost. 

23.  Suppose  in  the  large  number  of  signals  N(t)  of  average  duration 
t  there  are  constraints  in  the  number  of  symbols  si  . .  sn  so  that  these  sym¬ 
bols  'ave  durations  t^  ...  tn  (example  of  'symbols'  -  dot,  dash,  dot  plus 
letter  space,  dash  plus  letter  space,  dot  plus  word  space,  dash  plus  word 
space),  then  the  Information  capacity  which  the  channel  (which  can  discrimin¬ 
ate  signalling  elements)  will  permit  from  the  output  of  a  constrained  trans¬ 
ducer  is  given  by 
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where  W  is  the  largest  raal  root  of 
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('Proof'  -  else  this  will  be  considered  mysterious  -  is  based  on  Hartley's 
concept  of  information  rate  in  a  transmission  system 
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We  need  the  result 
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a  real  base  unit  of  time,  likely  a  discriminatable  unit 
of  time  such  as  1/ f 0  for  a  channel  of  frequency  band 
width  fQ. 

essentially  discrete  signal  times  for  different  symbols 
s^  ...  sn  of  an  'alphabet.' 

the  symbols  of  an  'alphabet. 

a  quantized  long  porcion  of  time  that  is  commensurate  with 
a  linear  combination  of  signal  times  (i.e.,  time  is  not 
continuous  but  only  a  not-so-dense  set  of  Diophantine 
mesh  points). 


N  (t/tQ) 


no.  of  all  possible  message  sequences  of  symbols. 


If  all  such  messages  were  laid  out  -  being  quantized  -  one 
would  see  that  some  end  in  the  symbol  Sj  associated  with  tj,  etc.  Thus  the 
total  number  of  all  such  message  sequences  is  given  by  these  mutually  exclu¬ 
sive  but  jointly  exhaustive  partial  sums.  There  are  N  (t-ti/tQ)  associated 
with  t^  endings,  etc.  or  therefore  the  above  result. 


Now  there  is  a  mathematical  theorem  (see  for  example  Brillouin 
(4!),  end  of  Chapter  4)  that  this  finite  difference  equation  haa  a  real  asymp¬ 
totic  solution  for  large  t 
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(As  Wolfowitz  remarks  "Due  to  a  convention  of  no  importance  but  hallowed  by 
tradition  (of  more  than  fifteen  yearsl),  all  the  logarithms  in  this  monograph 
will  be  to  the  base  2.") 

In  the  case  of  n  equal  symbols 
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(Suppose,  for  example,  all  32  letters  of  a  real  alphabet  were  coded  by  a  5 
place  code,  so  that  each  of  the  n  ■  32  symbols  had  equal  duration  t^,  then 
W  ■  2,  n  ■  5,  Then  the  information  rate  would  be  l/tQ  bits  per  second.) 


24.  If  the  transducer  is  constrained  to  a  finite  number  of  states, 
and  if  a  statistical  message  source  exists  whose  probability  of  symbol  usage 
conforms  to  a  particular  distribution,  then  Shannon’s  ’entropy1*  K  is  maximum 
and  equal  to  log£  W  bits  per  symbol. 


Let  1^?^  be  the  length  of  the  sth  symbol 
i  to  state  J  (i.e.,  t/tQ).  For  any  particular  state 
associated  with  transitions  of  probability  p>V  to  state 
bols  s  is  J 


in  passing  from  state 
i,  the  'entropy' 
j  by  virtue  of  sym- 
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If  is  the  probability  for  the  various  states  then  the  'entropy'  of  the  in¬ 
formation  source  will  be 
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We  can  show  that  if  the  p's  have  an  appropriate  value,  then  H  will  be  maximum. 
To  this  end  normalise  H  by 
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Let 


where 


B, 


-1 

W 


(s) 

ij 


Bi 


Z 

j,S 


.its) 

ij 


B, 


W 


(This  system  Is  satisfied  by  the  solution  for  W,  for 
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according  to  the  determinent  equation  for  W. 
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so  that  the  probability  of  any  junction  is  unity.) 
With  these  probabilities 
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possibly  the  assumption  of 
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For  a  somewhat  obscure  reason 
rommutativitv* ,  l.e.,  p(s'  *  -  then 
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This  choice  of  probability  has  maximized  the  entropy,  which  is 
now  proportional  to  the  channel  rate 
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If  1  is  rated  in  time  units,  then 


H  ■  m  bits  per  unit  time, 

at 


25.  Having  established  a  criterion  by  which  the  maximum  value  of 
the  flow  of  'entropy*  of  a  message  source  can  approach  the  channel  capacity 
of  a  discrete  transducer,  Shannon  enunciates  his  "fundamental  theorem  for  a 
noiseless  channel"  that  a  source  with  entropy  H  bits  per  symbols  and  a  trans¬ 
ducer  and  channel  with  a  capacity  C  bits  per  second  can  be  encoded  to  transmit 
at  the  average  rate  of  C/H  symbols  per  second,  but  not  greater. 


We  have  already  shown  that  H/-  of  the  transducer  and  channel 
can  only  be  maximized  at  the  channel  capacity 


I  <  C  -  log2  W. 

However  at  most  (if  the  transducer  is  'non- singular, '  i.e.,  a  second  trans¬ 
ducer  can  be  constructed  and  connected  that  will  recover  the  input  of  the 
first  transducer  from  its  output)  the  entropy  in  source  output  and  transducer 
output  are  equal,  so  that 


m 


for  the  source. 


To  prove  the  equality  requires  special  encoding,  i.e.,  demon¬ 
stration  that  the  required  symbol  probabilities  are  achieved.  Shannon  demon¬ 
strates  2  such  codings,  attributing  one  of  those  also  to  Fano.  Another  sys¬ 
tematic  method  which  has  become  known  as  minimum-redundancy  codes  was  developed 
by  Huffman  (1952).  Basically  they  all  seek  to  encode  common  high  probability 
'symbols'  with  short  duration  sending  units,  and  low  probability  symbols  with 
longer  duration  sending  units. 

One  must  note  (see  Cherry  (51),  p.  36)  that  Morse's  code  had  a 
considerable  appreciation  of  this  fact  cn  an  empirical  basis. 

Since  this  is  regarded  a*  one  of  the  cornerstones  of  informa¬ 
tion  theory  -  Shannon's  first  or  fundamental  theorem  on  noisiess  discrete 
coding  •  it  is  worth  considerable  discussion  and  explanation. 


29 


First,  we  may  consider  a  message,  things  like 
DEAR  MOM,  I'M  COMING  HOME  CHRISTMAS;  SEND  MORE  MONEY,  etc. 

Second,  we  may  consider  a  transducer,  things  like 
a  two  position  switch 

a  two  position  switch  with  a  spring  return  to  open 
an  n-position  switch 

an  n-position  switch  with  a  sequenced  open-close  cycle,  etc. 

In  considering  'sending  units'  which  may  have  to  bring  in  the 
physical  limitations  of  the  network,  Shannon  has  slurred  these  over.  Thus  he 
more  nearly  views  'symbols’  as  a  complex  of  sending  units,  with  what  seems  an 
undefined  but  implicit  assumption  that  the  transducer  and  network  have  already 
been  selected  for  the  unit  of  sending  time.  Symbols  are  to  be  rated  by  dura¬ 
tions  of  sending  time  units.  Further  -  in  this  discrete  system  discussion  - 
he  recognizes  a  set  of  finite  symbols,  the  source's  alphabet.  However  there 
is  little  indication  that  the  transducer  and  channel  sending  units  are  any¬ 
thing  but  binary  states  of  on  and  off.  The  discussion  seems  always  centered 
on  encoding  the  'message'  of  the  source  which  may  have  'words'  which  are  made 
up  of  source  'letters,'  and  represented  by  a  source  alphabet,  or  better  by  a 
source  dictionary.  Wc  can  explain  things  by  saying  that  the  dictionary  is 
made  up  of  letters  and  words,  and  messages  by  a  source  alphabet  of  letters. 
These  dictionary  entries  may  then  be  encoded  oy  transducer  symbols. 

Whst  are  the  transducer  symbols  -  in  the  present  instance  the 
discrete  symbols?  From  Shannon,  they  appear  to  be  a  timed  sequence  of  sending 
units  that  make  up  a  finite  sequence  of  symbols.  One  presumes  that  he  viewed 
these  sending  units  as  both  discrete  physical  switch  states  and  associated 
electrical  voltages  or  currents.  Thus  one  might  consider  a  symbol  as  defined 
by  a  sequenced  block  of  m-ary  steps  that  take  n  t0-tlme  units. 
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The  Issue  of  constraints  on  the  switch  state  -  whether  all  symbols  are  acces¬ 
sible  to  call,  or- whether  there  are  'memorized*  rules  of  what  symbol  subse¬ 
quences  are  possible  or  not  possible,  can  be  buried  for  the  time  in  a  grander 
array  of  symbols,  i.e.,  the  transducer  can  be  extended  physically  to  include 
symbols  that  will  make  it  a  one  state  entity.  Thus,  consider  Shannon's  ex¬ 
ample  of  four  symbols  A,B,C,D,  -  dot,  dash,  letter  space,  word  space  -  in 
which  after  A  or  B  you  are  in  state  1  and  can  choose  symbols  A,B,C,  or  D;  but 
after  C  or  D,  you  are  in  state  2  and  can  only  choose  A  or  B.  We  can  change 
this  to  a  six  symbol  alphabet  -  dot,  dash,  dot  plus  letter  space,  dot  plus 
word  space,  etc.  -  which  will  always  be  in  one  state.  If  any  one  objects, 
this  may  be  considered  to  be  a  compound  symbol. 

Suppose  first  that  there  were  28  symbols  all  of  equal  duration 
of  5  tG-time  units.  Then  the  W  length,  or  which  log  W  is  the  channel  capacity, 
is  W-^  =  28  or  W  is  near  2.  Basically  W  is;  the  number  of  elementary  transducer 
states  that  can  form  the  symbols.  Tnere  are,  in  this  case,  2  states.  However, 
suppose  as  per  Nyquist's  or  Hartley's  wish,  we  had  used  5  states,  then  we 
could  code  the  28  symbols  more  nearly  into  2  tQ-time  units. 

Shannon's  computational  rate  is  an  'exact'  rule  for  computing 
W.  However  it  is  not  really  much  other  than  an  extension  of  Hartley's  rule 
for  relating  sending  units,  or  primary  symbols  or  machine  letters,  etc.  to  the 
number  of  sequences,  here  machine  symbols. 

Now  we  must  get  the  meaning  of  W  if  there  is  more  than  one 
unit  of  time  involved.  For  example,  if  there  are  two  units  of  time  such  as 
14  symbols  of  1  time  unit  and  14  symbols  of  5  time  units,  or  3  and  5,  then 

1  *  14  W'1  +  14  W"5 

from  which  W  *  14  approximately,  so  that  it  is  only  the  1  time  unit  symbols 
that  count  because  the  other  symbols  are  so  sparse.  Kven  in  the  second  case 

1  -  14  W*3  +  14  W’5 


W  ■  14^/3  to  within  6%. 

This  is  discussed  at  greater  length  in  Brillouin  (41).  Never¬ 
theless  W  may  be  regarded  as  the  effective  number  of  elementary  transducer 
discrete  states  used  for  sending.  Then  capacity  is  defined  as  the  log  W. 

Now  it  does  not  make  sense  to  uae  log2  unless  W  is  effectively 
2.  Then  capacity  would  become  1 / t 0  binary  units  per  unit  time.  However,  sup¬ 
pose  23  symbols  were  sent  with  only  2  equal  time  units,  so  that  W  •  5,  it  is 
more  nearly  true  that  the  transducer  and  channel  'capacity'  was  l/tQ  5-ary 
units  than  l/t0  log2  5  binary  units.  Nevertheless,  if  one  wishes  to  follow 
the  convention  In  the  fields  it  is  accessary  to  use  log2  as  the  measure  of 
'capacity.'  This  is  a  statement  that  the  communications  engineer  still  regards 
the  ultimate  encoding  to  be  in  a  binary  switch  state  device. 
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Thus  channel  and  transducer  'capabity'  are  to  be  regarded, 
roughly,  as  the  number  of  on-off  states  per  second  that  can  be  encoded  and 
delivered  with  reasonably  good  resolution.  At  present,  the  'reasonably  good' 
is  perfect.  If  instead  of  delivering  symbols  with  an  equal  number  of  on-off 
elements,  there  is  a  weighting  -  which  can  be  estimated  from  a  long  message 
of  symbols  -  in  favor  of  the  preponderant  number  of  shorter  time  symbols  that 
determines  a  number  of  states  somewhat  different  from  the  2  on-off  states,  or 
alternately,  if  m-ary  states  are  permitted,  the  'capacity'  will  be  similarly 
defined.  However,  Hartley's  rule  will  be  taken  into  account  and  the  informa¬ 
tion  rate  in  binary  unit  states  will  be  increased  by  the  log2  of  the  number 
of  states. 


Thus,  whereas  at  the  start,  it  wasn't  clear  what  made  up  the 
'capacity'  of  a  transducer  and  channel;  now  it  is  the  elemental  'sending  units 
of  time  element  t0  -  which  is  tied  to  the  bandwidth  of  the  channel  -  which  is 
to  be  reckoned  with  for  capacity.  But  we  must  similarly  reckon  with  the  in¬ 
cremental  sensitivity  in  time,  which  Shannon,  up  to  this  point,  has  not  de¬ 
fined  well.  Although  there  was  the  ambiguous  point  in  Nyquist's  paper  that 
it  paid  to  use  more  than  two  states,  but  their  'cost'  might  be  prohibitive, 
and  in  Hartley's  paper,  that  information  rule  was  proportional  to  the  log  of 
the  number  of  primary  signals,  yet  Hartley  chose  to  prescribe  a  binary  unit 
for  'information.' 

Now,  if  we  regard  the  channel  as  being  capable  of  C'  m-ary 
units  per  second  or  C  binary  units  per  second,  we  come  to  Shannon's  first 
theorem,  that  the  information  source  can  be  encoded  to  where  it  is  transmit¬ 
ting  the  greatest  amount  of  'information,'  using  the  given  transducer  and 
channel  symbols.  However  this  cannot  be  done  by  letting  the  source's  'alpha¬ 
bet'  be  identical  to  the  transducer's  alphabet.  We  must  remember  that  the 
greatest  amount  of  information  means  solely  the  least  amount  of  time.  Its 
success  depends  on  a  priori  probability  information  or  a  posteriori  probabil¬ 
ity  information  developed  a3  time  goes  on  from  similar  sources.  This  is  the 
meaning  of  the  ergodic  source  hypotheses.  We  will  illustrate  how  this  is 
done. 


Suppose  we  have  a  source  that  uses  a  four  letter  alphabet  A, 
B,C,D  with  probabilities  1/2,  1/4,  1/8,  1/8,  where  successive  symools  are 
chosen  independently  with  no  constraints.  Suppose  the  transducer  and  channel 
had  only  an  equal  time  binary  unit  capability  of  say  2  binary  units  per  second 
If  we  encoded  the  alphabet  A  •  00,  B  *  01,  C  ■  10,  D  ■  11,  then  a  characirr- 
istic  measage,  such  as  AAAABBCC,  would  take  8  seconds,  or  8  binary  units  per 
2  sending  units.  Now,  if  we  measure  the  entropy 


binary  units 
per  sending  unit 


it  should  be  possible  to  approximate  a  code  to  achieve  this  level.  This  is 
bnannon's  example.  He  shows  that  A  •  0,  B  •  10,  C  *  110,  D  ■  111  will  do  this 
for  the  characteristic  message  will  be  000010 10110110,  taking  only  7  seconds. 
The  binary  digit  sending  units  0,  1  now  have  probabilities  1/2,  1/2.  The 
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maximum  possible  entropy  for  the  original  set  is  logo  4  *  2  if  A,B,C,D  had 
equal  probabilities.  Thus  the  relative  entropy  is  7/8.  The  basic  thing  to 
note  is  that  the  sending  duration  for  the  symbol  has  taken  note  of  the  prob¬ 
ability  to  make  common  symbols  shorter  in  time. 

Referring  to  these  as  minimum- redundancy  codes,  Bell  (42)  il¬ 
lustrates  as  follows:  we  might  encode  26  English  letters  into  a  5  unit  binary 
code,  requiring  5  binary  unit's  per  symbol,  or,  from  a  certain  number  l/tQ  of 
binary  units  per  second,  a  number  of  symbols  per  second.  If  we  take  into 
account  the  probability  of  English  letters,  including  spaces,  Reza  (44)  gives 
us  the  entropy  4.03  binary  units  per  symbol.  If  we  disregard  the  relative 
frequencies,  then  it  only  would  require  4.76  binary  units  per  symbol.  Bell 
(42)  illustrates  a  minimum  redundancy  code  on  the  Shannon-Fano  principle  for 
the  26-letter  alphabet  which  for  English  probabilities  requires  4.16  digits 
per  symbol;  or  mentions  a  Gllbert-Moore  encoding  of  the  26  letters  plus  space 
with  4.12  digits  per  symbol.  These  numbers  indicate  some  measure  of  the  degree 
to  which  a  gain  in  information  rate  can  be  obtained  by  specialized  coding  that 
fulfills  the  Shannon  coding  theoi 2m;  namely  a  reduction  of  from  4.8  units  per 
symbol  to  near  4.1  units  per  symbol  by  encoding  using  letter  probabilities. 

We  can  Illustrate  the  Huffman  method  of  coding,  which  is  a  most 
efficient  code  for  a  set  of  symbols  having  different  probabilities  from  Pierce 
(21).  He  lists  a  series  of  words  of  different  probabilities.  Array  these  in 
order  of  monotonic  decreasing  probability 


Symb. 

Prob. 

Symb. 

Prob. 

Symb. 

Prob. 

Symb. 

Prob. 

Symb. 

Prob. 

H 

(.50) 

H 

.50 

H 

.50 

H 

.50 

H 

.50 

G 

(.15) 

G 

.15 

G 

.15 

G 

.15 

FE 

.22 

F 

(.12) 

F 

•12 

F 

.12 

DCBA 

.13 

G 

.15 

E 

(.10) 

E 

.10 

E 

.10 

F 

.12 

BADC 

.13 

D 

(.04) 

B,A 

.05 

D,C 

.08 

E 

.10 

C 

(.04) 

D 

.04 

B,A 

.05 

B 

(.03) 

C 

.04 

A  (.02) 


Adding  the  two  minimum  probabilities  and  considering  the  symbol  as  one  and  then 
reordering  the  one  fewer  number  of  symbols,  one  may  proceed  by  such  a  sequence 
to  a  unity  sec.  Now  construct  a  tree  with  branches  1  and  0  from  the  unity 
probability,  labelling  each  branch  ’above’  l,  and  ’below’  0. 
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1 


0 


(1.00) 


Av.  No.  digits 


Code 

per  Symbol 

1 

.50 

001 

.45 

Oil 

.36 

010 

.30 

00011 

.20 

00010 

.20 

000C1 

.15 

00000 

.10 

2.26 

This  code  gives  2.26  digits  per  symbol.  If  we  had  used  a  .3  digit  code  for 
these  8  symbols,  it  would  have  required  3  digits  per  symbol.  The  entropy  is 
2.21  digits  per  symbol.  This  again  illustrates  how  close  cne  can  come  with 
a  minimum  redundancy  code  like  the  Huffman  code.  The  theory  is  discussed 
more  fully  in  Abramson  (43). 

In  commenting  on  the  particular  Huffman  code  for  the  26  let¬ 
ter  alphabet,  Bell  (42)  makes  the  comment,  validly  in  our  opinion,  "the  rather 
complicated  coding  ...  leads  to  a  straight  average  length  of  5.65  digits  per 
character,  and  an  English-language  weighted  average  of  4.16  digits  per  char¬ 
acter,  an  advantage  over  the  5-unit  code  which  is  clearly  not  sufficient  to 
justify  the  complication."  This  should  be  compared  with  Abramson's  statement 
in  another  illustration  of  encoding  compression,  "We  have  thus  shown  that  it 
is  possible  to  transmit  the  same  type  of  information  ...  using  about  6  percent 
fewer  binits  (binary  digits)  per  message,  on  the  average.  A  reduction  of  6 
percent  in  the  number  of  binary  digits  to  be  transmitted  in  a  practical  com¬ 
munication  rystem  is  a  gain  of  iomt>  importance." 

(This  characterizes  the  quality  of  two  extreme  views  of  infor¬ 
mation  theory.  Some  authors  -  see  for  example  Reza's  introduction  (44)  - 
have  regarded  information  theory  cs  a  subject  completely  embedded  in  the  theory 
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of  mathematical  statistics,  and  to  them  the  excitement  has  lain  in  the  direc¬ 
tion  of  the  rigor  and  theorematization  of  McMillan,  Khinchine,  Feinstein,  and 
Wolfowitz,  et  al  (48).  To  others  -  including  us  tentatively  -  its  value  exists 
in  it  being  a  useful  adjunct  to  communications  theory  in  suggesting  or  remind¬ 
ing  one  of  various  probabilistic  elements  of  'messages.' 

For  example,  Brillouin's  (41)  assessment  of  an  example  in 
ternary  coding,  using  sending  units  +1,  0,  -i  in  which  he  shows  3.3  units 
(ternary  units)  per  symbol  for  26  English  symbols  plus  space  by  a  somewhat  poor 
coding,  and  indicates  that  the  number  of  bits  per  symbol  3.3  logo  3  ■  5.25  is 
quite  a  bit  higher  than  the  4.0  to  4.65  that  can  be  obtained  with  some  binary 
codes,  misses  the  point,  that  the  concern  is  with  getting  the  maximum  informa¬ 
tion  about  messages  through  in  the  shortest  time  -  commensurate  with  a  band¬ 
width  limitation  for  the  channel.  It  was  Nyquist's  point  to  argue  out  various 
pro  and  con  'costs';  however  the  binary  measure  is  just  an  artifice.) 

A  much  more  incisive  discussion  of  m-ary  minimum  redundancy 
codes  is  given  in  Abramson  (43) . 

26.  Huffman  investigated  the  problem  of  compact  or  minimum  redun¬ 
dancy  codes  for  both  binary  as  well  as  m-ary  codes  in  1952.  This  is  discussed 
in  Abramson  (43).  Their  construction  is  similar  to  the  construction  for  binary 
codes,  in  a  reduction  of  an  alphabet  with  various  probabilities  by  combining 
the  symbols  one  at  a  time.  Dummy  symbols  with  zero  probability  may  have  to  be 
added. 


To  give  a  comparison  of  compact  codes  for  m-ary  coding,  Abramson 


gives  an  example  of 

13  symbols  with  attendant  probabilities 

-  1/4  1/4  1/16  1/16 

1/16  1/16  1/16  1/16 

1/16  1/64 

1/64  1/64  1/64  -  and  estimates 

the  'code  lengths, 

i.e.,  the  'channel 

capacity' 

for  particular  compact  codes  as 

3.3  binary  digits 

per  symbol  for  binary  coding; 

m-ary  digits 

Sending  rate 
symbols/sec 
(if  channel  can 
transmit  n  sending 

Sending  rate 
(if  no 

per  symbol 

m 

units  per  second) 

compact  code) 

3.3 

2 

.32  n 

.25  n 

2.0 

3 

.48 

.33 

1.6 

4 

.64 

.50 

1.4 

5 

.69 

.50 

1.4 

6 

.74 

.50 

1.2 

8 

.84 

.50 

1.1 

10 

.94 

.50 

1.0 

12 

.97 

.50 

1.0 

13 

1.00 

1.00 

(The  taoie  illustrates  Nyquist's  point.  Firm,  it  shows  when  there  is  real 
gain  from  compact  codes;  and  second,  what  gain  there  is  from  m-ary  symbols. 
The  gains  are  appreciable  for  ternary  and  quaternary  symbols,  and  perhaps 
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greatest  in  going  from  a  non-compact  binary  code  to  a  compact  ternary  code. 
However,  this  bears  out  that  the  problem  is  only  mildly  a  coding  problem  and, 
in  the  main,  a  'cost'  design  problem.) 

27.  In  order  to  make  real  gain  in  information  coding,  the  struc¬ 
ture  of  language,  as  begun  by  Markov  chains,  must  be  taken  into  account. 

(We  can  anticipate  the  very  elementary  conclusion  that  will 
come  at  the  end  of  this  section,  that  it  is  much  more  compact  to  speak  in 
words  than  in  letters.  In  later  sections,  as  we  explore  the  content  of  human 
information,  we  will  find  it  is  more  compact  to  speak  in  ideas  than  in  words, 
and  ultimately  in  the  section  on  the  brain,  we  will  speculate  that  speaking 
is  done  more  often  according  to  the  major  poles  of  human  behavior  than  in 
ideas.  Thus  graduolly  the  'perfection'  of  digitilized  or  quantized  data  or 
information  will  fade  as  the  greater  perfection  of  the  analogous  nature  of  the 
sources  emerges.) 


Brillouin  (41)  for  example,  illustrates  some  of  the  known  re¬ 
sults  on  language  redundancy.  For  English  chains  there  is  required 

' entropy' 

(binary  digits  per  letter) 


all  letters  and  space  equiprobable  4.76 
using  probabilities  of  letters  4.03 
probabilities  of  groups  of  2  letters  3.32 
probabilities  of  groups  of  3  letters  3.1 


If  now,  as  was  done  by  Shannon  in  1951,  the  question  is  raised  on  what  is  re¬ 
quired  for  a  letter  after  the  previous  letters  are  known,  instead  of  the  4.8 
binary  digits  per  letter,  the  number  quickly  drops  -  experimentally  to  an 
upper  bound  of  about  2  binary  digits  per  letter  for  as  few  as  8  letters,  and 
likely  approaches  a  limiting  upper  bound  of  1.4  binary  digits  per  letter  for 
long  messages.  A  lower  bound  quickly  approaches  1,  and  ultimately  0.6  binary 
digits  per  letter.  The  limits  0.6  to  1.4  as  compared  to  4.8  are  viewed, 
generally,  as  the  degree  to  which  English  is  redundant  (in  letters  -  the  basic 
compression  in  this  direction  is  that  of  considering  what  pxobabilistic  chains 
we  carry  in  our  heads.  It  is  represented  really  by  such  compressions  as  SND 
MR  MNY,  i.e.,  stenographic  codes  that  are  privately  used,  or  if  not  too  com¬ 
pressed,  can  be  passed  between  'experts.'  However  the  objections  in  a  variety 
of  illustrations  of  too  much  compression  are  that  one  stenographer  cannot 
really  read  another's  complex  dictation.  We  note  this  as  a  matter  of  experi¬ 
mental  test  -  there  is  a  newspaper  game  in  a  number  of  papers  which  tests  one's 
ability  to  guess  the  appropriate  vowel  in  various  'ambiguously'  defined  words; 
the  layman  can't  understand  the  shorthand  of  the  expert;  more  telling  -  in 
having  attended  a  few  thousand  technical  talks  -  moat  of  the  audience  cannot 
really  follow  the  detailed  technical  content  of  any  taik!) 


28.  Presumably  making  use  of  the  experimental  results  that  Z ip f 
presents  in  his  1949  book,  Shannon  (1951)  cast  some  light  on  the  content  of 
English  messages,  caking  words  Into  account. 
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English  letters  can  be  encoded  by  about  5  binary  digits  per 
letter,  and  that  -  by  count  -  the  average  length  of  a  word  is  about  5-1/2 
letters,  so  that  about  27.5  binary  digits  are  required  per  word.  However,  if 
we  consider  what  a  large  competent  English  dictionary  consists  of,  we  may  con 
elude  that  16,000-32,000-64,000  words  are  large  to  exhaustive  dictionaries. 
Coded  in  binary  form,  this  could  amount  to  about  14-15-16  binary  digits  per 
word,  i.e.,  near  14  practically,  or  near  3  binary  digits  per  letter.  Now  we 
may  consider  the  moderate  effect  of  more  compact  coding. 

Zipf  (see  Cherry  (51))  studied  the  occurrence  of  words  from 
Joyce's  ULYSSES  and  from  American  newspapers  and  found  approximately 


n 


pn  =  probability 
n  »  rank  order. 

A  rationalization  of  this  law  was  offered  by  Mandelbrot  -  see 
Cherry  (51)  for  example.  Chapter  5.  Shannon  presents  such  a  chart  for  8727 
words.  Thp  most  common  words,  with  probabilities  up  to  the  107o  level  are 
THElf  0F2,  AND3,  TO^,  11Q,  0R13,  SAY18,  R£ALLY2i,  QUALITY^,  etc. 


0.1 


From  this  he  finds  an  entropy  of  11.8  binary  digits  per  werd,  or  at  5.5  let¬ 
ters  including  spaces  per  word,  2.14  binary  digits  per  letter.  It  is  this 
level  that  is  a  measure  of  what  may  be  achieved  by  compact  coding  of  words. 

29.  Having  thus  far  sought  to  view  coding  schemes  for  eliminating 
redundancy  in  messages  and  to  design  codes  using  the  smallest  number  of  m-ary 
sending  units  per  letter,  we  find  there  are  times  that  redundancy  is  used  for 
various  checking  purposes.  Error  detecting  codes  and  correcting  codes  are 
discussed  in  Brillouin  (41),  Bell  (42),  Abramson  (43),  Pierce  (21).  Their 
search  was  instituted,  presumably  starting  with  Golay  (1949),  and  Hamming 
(1950).  However  Shannon's  theorem  of  the  likelihood  of  good  transmission  in 
the  face  of  noise  provided  the  basis  for  such  search.  Thus  this  problem  serves 
aa  a  plausible  transition  to  Shannon's  second  theorem. 

(Error  free  codes,  by  the  use  of  redundancy,  can  stretch  from 
such  primitive  examples  as  repeating  each  symbol  twice  or  three  times;  to  such 
a  scheme  as  shown  by  Pierce  in  which  8  check  symbols  are  used  to  check  each 
group  of  16  symbols  as  a  parity  check  by  rows  and  column*  of  the  16  symbols 
arranged  as  a  4  by  4  matrix;  to  the  Hamming  method,  etc.  See  these  references 
or  (50)  for  more  detail.) 

* 

30.  Shannon's  second  theorem  (18)  "The  Fundamental  Theorem  for  a 
Discrete  Channel  with  Noise"  is  set  in  the  following  framework.  If  a  channel 
is  noisy,  the  result  of  m-ary  sending  unit*  supplied  as  the  Input  to  the 
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channel  by  the  transducer  will  be  uncertain.  He  discusses  this  In  terms  of 
the  errorc  among  large  numbers  of  binary  digits  per  second.  Let  H(x)  be  the 
'entropy*  of  the  set  of  symbols  of  the  input;  H(y)  for  the  set  of  symbols  in 
the  output.  If  no  noise  H(x)  ■  H(y).  If  there  is  noise,  then  it  is  the  joint 
entropy  H(x,y)  which  will  be  conserved. 


H(x,y)  -  -  E  P(**j)  *°82  PC1*:)) 


p(i,J)  is  the  probability  of  the  Joint  occurrence  of  the  ith  symbol  in  the  x 
alphabet  and  the  jth  symbol  in  the  y  alphabet. 

However  this  joint  entropy  will  be  the  entropy  of  source  in¬ 
put  or  channel  output  augmented  by  'conditional  entropies'  Hx(y),  Hy(x)  such 
that 


where 


H(x,y)  -  H(x)  +  ^(y)  ■  H(y)  +  Hy(x) 


Hx(y) 


L 


P(1J)  log2  Pi ( J) 


P1(j) 


pq.J) 

E  p(i.J) 

j 


The  rate  of  actual  transmission  R  is 
R  -  H(x)  -  H  (x) 

h  (x)  is  called  the  'equivocation. '  It  measures  the  ambiguity  of  the  received 
signal  (Shannon's  illustration  is  an  error  of  1  in  100  for  a  two  symbol  1  or 
0  when  these  are  equiprobable .  The  equivocation  Hy(x)  -  -  (.99  log2  .99  + 

.01  logj  01)  ■  .08  binary  digits  per  symbol,  where  the  'entropy'  is  1  binary 
digit  per  symbol.) 

Following  Pierce  (21),  p.  164;  we  note  that  the  greate8t  pos¬ 
sible  rate  of  transmissions  i.e.,  a  new  definition  of  channel  capacity  for  a 
noisy  channel,  will  be  this  rate  of  'entropy'  minus  'equivocation.'  This  is 
the  sense  of  Shannon's  auxiliary  theorem  10  which  sava  noth. .-'.a;  else  than  that 
if  an  'omniscient*  observer  were  present  -  observing  both  input  am"  output  - 
he  could  send  back  through  a  correction  channel  just  the  correction  for  the 
equivocation  error,  with  negligible  error.  Shannct  indicates  that  this  pro¬ 
vides  an  upper  bound  for  capacity.  The  point  of  Shannon's  theory  thus  emerges, 
as  Pierce  puts  it  by  example,  that  if  In  transmitting  100  symbols  in  a  channel 
In  which  the  equivocation  i*  0.08  binary  digits  per  symbol,  so  that  the  channel 
capacity,  at  most,  might  be  92  correct  nonredundant  digits  ir.  this  noisy  chan¬ 
nel,  ve  can  use  a  redundant  code  using  not  more  than  6  digits  per  100  digits. 
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so  that  in  long  sequences  of  100  digits  de  ’red  to  this  noisy  channel,  we 
will  get  nearly  92  correct  nonredundant  dl0^Cb,  Thus  the  issue  of  checking 
codes  has  been  joined  with  that  of  noisy  channels. 

In  order  to  encode  me;  lages  free  of  error,  we  must  code  by 
long  symbols;  i.  >y  the  type  of  extensions'  that  Abramson  discusses  (43), 
or  by  large  block  encoding.  Previously  we  were  concerned  with  removing  re¬ 
dundancy  in  various  ways  by  examining  messages  in  large  blocks.  Now  we  are 
concerned  with  reintroducing  redundancy  into  blocks  in  small  amounts  so  as  to 
overcome  noise  equivocation. 

In  principle,  we  do  not  lose  capacity  by  more  than  the  noise 
equivocation,  and  it  is  not  true  that  we  have  to  trade  channel  capacity  for 
reliability.  'Equivocation'  in  the  transducer  and  channel  for  th*  message  it 
handles  determines  the  loss  in  channel  capacity.  Coding,  then,  may  bring  up 
the  reliability  to  the  reduced  channe  capacity. 

(In  this  context  the  literary  discussion  about  Shannon's  re¬ 
sults  become  more  meaningful.  Obviously,  'equivocation,'  or  'error'  in  our 
cruder  metrological  sense  reduces  the  amount  of  information  that  can  be  trans¬ 
mitted.  We  now,  however,  begin  to  have  a  better  idea  of  what  this  entire 
discussion  of  information  theory  had  as  its  direction.  'Information  theory' 
says  that  we  must  regard  each  added  digit  as  a  piece  of  information, 

980.665  dynes/cm^  has  6  decimal  digits;  100.2  has  4,  etc. 

Two  numbers,  980.665;  134.6  have  10.  We  would  not  concede  this  in  uietrologi- 
cal  theory.  We  recognize  that  it  is  a  clerical-legalistic  judgment  that  says 
the  content  of  6  place  numbers  is  not  to  be  Judged  by  the  transmission  net¬ 
work  -  or  the  telephone  company.  However  this  is  precisely  one  way  in  which 
much  of  the  nonsense  about  scientific  information  creeps  in,  by  reports  of 
meaningless  numbers.  Legally,  we  know  assets  are  reported  as  $121,142,321.26 
but,  practically,  we  know  that  the  real  certainty  probably  fluctuates  quite 
wildly  in  the  4th  significant  figure.  The  essence  of  the  matter  is  likely 
the  degree  of  involvement,  or  -  to  borrow  a  terra  -  the  degree  of  interaction. 

It  is  oui  complaint  that  the  computer  analysis  -  by  digital  computers  -  of  a 
system  of  non- isomorphic  relations,  that  are  simply  descriptive,  often  irrele¬ 
vant,  redundant,  etc.,  regardless  of  the  largeness  of  their  number,  does  net 
improve  'information'  about  a  system  or  real  'predictive  value.'  We  take  our 
'pure'  stand  -  likely  equally  quixotically  -  on  the  thesis  that  from  a  wrong 
premise  any  conclusion  follows  -  if  yc^  are  clever  enough  to  construct  the 
line!  The  message  "Dear  John,  etc.  Pay  up!"  that  the  boss  gives  to  his  sec¬ 
retary  is  sufficient  for  sender  and  receiver  to  encode  regardless  of  how  re¬ 
dundant  the  letter  she  writes  is.  Thus  we  should  finsliy  note  that  a  theory 
of  transmission  in  the  face  of  noise,  s  theory  of  measurement  in  the  face  of 
error,  and  a  theory  of  human  conruinlcation  with  imperfect  source  and  channel 
are  not  all  aspects  of  the  same  thing  -  and  particularly  not  all  aspects  of  a 
mathematical  theory  of  "tochastic  processes,  although  mathematics  can  always 
provide  interesting  tools.) 

Shannon's  second  theorem  states  that  if  a  discrete  noisy  chan¬ 
nel  and  transducer  has  such  a  potent  is  1  capacity  tor  transmitting  symbols  -  as 
its  symbol  'entropy*  less  the  symbol  'equivocation'  and  if  there  Is  a  source 
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producing  signal  'entropy1  at  a  rate  H;  if  H  S  C  then  a  coding  system  exists 
such  that  messages  can  be  transmitted  with  arbitrarily  small  error;  or  if 
H  >  C,  one  can  encode  so  that  the  equivocation  is  essentially  less  than 
H  -  C. 

However  the  proof  does  not  exhibit  the  coding  system,  only 
that  such  a  code  exists  among  a  group  of  codes.  It  is  this  concept,  that  in¬ 
formation  can  be  transmitted  without  'error'  and  without  loss  of  speed,  except 
for  a  loss  equivalent  to  'equivocation'  (i.e.,  that  it  is  only  the  'equivoca¬ 
tion'  which  is  irreducible)  that  has  generally  been  viewed  in  the  literature 
as  marvelous. 


However,  as  Shannon  pointed  out  (18)  in  1948,  an  attempt  to 
obtain  a  good  approximation  to  ideal  coding  is  generally  impractical,  and  no 
explicit  descriptions  of  a  series  of  approximations  to  the  ideal  have  been 
found;  and  in  1963  Abramson  (43)  noted,  in  discussing  the  theorem,  that 
Shannon  had  to  introduce  the  idea  of  random  coding  as  a  coding  procedure, 
which  looked  at  more  closely  "it  is  possible  to  view  the  coding  procedure  . . . 

as  really  no  coding  procedure  at  all,"  and  that  once  having  arrived  at  some 

fixed  code,  there  is  no  assurance  that  it  is  a  good  code.  Thus  the  theorem 
is  little  more  than  an  existence  proof,  and  a  little  less  than  a  constructive 
proof.  Its  proof  indicates  methods  for  generating  good  codes  on  the  average. 

Abramson  views  the  situation  as  less  than  satisfactory  for 
the  engineer  who  asks  how  to  design  a  code  that  will  achieve  the  reliability 

Shannon  promises.  He  states  that  choosing  code  words  at  random  -  required  by 

Shannon's  'constructive'  proof  -  may  require  impractical  implementing  equip¬ 
ment,  and  if  the  theorem  has  shown  that  almost  all  codes  have  small  error 
probability,  can  one  find  a  deterministic  way  of  producing  good  codes?  "This 
is  the  dilemma  which  has  persisted  to  mock  Information  theorists  since  Shannon's 
original  paper  in  1948.  Despite  an  enormous  amount  of  effort  (Peterson,  1961) 
spent  since  that  time  in  quest  of  this  Holy  Grail  ot  information  theory,  a 
deterministic  method  of  generating  the  codes  promised  by  Shannon  is  still  to 
be  found." 


Shannon  (18)  of  course  pointed  out  in  his  discussion  that  the 
507o  redundancy  in  English  is  likely  already  built  in  to  allow  considerable 
noise  in  transmission.  "...  the  reasonable  English  sequences  are  not  too  far 
(in  the  sense  required  for  theorem)  from  a  random  selection." 

The  concept  of  Shannon's  coding,  approximately,  is  that  if  we 
had  a  coding  for  a  very  large  symbol  sequence  -  this  could  be  achieved  by 
Abramson's  'extensions,'  i.e.,  by  use,  not  of  symbols  A,B,  etc,  but  AB,  BA, 
etc.;  ABC,  ACD,  etc.,  ABCD,  DBAE,  etc.  -;  that  these  were  compact  codes  so 
that  one  can  approach  the  channel  capacity  rate  proposed  by  the  first  theorem; 
that  the  extensions  were  continued  (this  is  our  view  of  the  likely  needs  of 
the  proMem)  so  that  the  number  of  super  symbols  were  sparse  (which  is  true, 
say,  for  5  letter  combinations.  For  example,  Bell  (42)  estimates  one  in  seven 
English  words  are  five  letter  words,  or  approximately  10^  words  for  a  large 
1()5  English  word  dictionary,  whereas  26^  combinations  is  about  10^  5-letter 
'words.'  Thus  only  1  in  10-*  combinations  are  real  words);  that  the  coding 
among  compact  codes  had  the  property  that  1  similar'  super  symbols,  or  the 
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measure  ’distance'  between  super  symbols  are  far  removed  or  Isolated  (Hamming 
in  1950  gave  a  simple  concept  of  distance,  the  hamming  distance,  between  two 
sets  of  symbols  such  as  binary  sets  1111  and  1110  as  the  number  of  different 
places  by  which  they  differ);  that  the  'correct'  supersymbol  would  represent 
the  'nearest'  symbol  to  the  one  received  as  output,  or  selected  at  random  from 
the  essentially  equally  near  ones;  then  Shannon's  coding  theorem  is  that  coding 
for  these  equiprobable  supersymbols  (it  is  probably  convenient  to  think  that 
the  compact  code  for  these  supersymbols  has  been  recoded  into  a  constant  send¬ 
ing  unit  code)  on  the  average,  for  ail  possible  sequences  of  supersymbols,  will 
have  very  little  error. 

(What  he  has  tried  to  do  is  block  code  'words,'  i.e.,  groups 
of  the  original  source's  alphabet  into  his  common  repertoire.  However,  for 
English  we  know  that  the  common  repertoire  is  words,  and  somewhat  less  common, 
cliche's.  Thus  really  what  Shannon  is  asking  for  is  those  alphabet  'exten¬ 
sions'  or  blocKs  that  are  equiprobable  and  common.  Typically,  suppose  we  had 
a  1000  wc  . •.!  dictionary  of  equiprobable  'supersymbols. '  These  might  consist  of 
letter  coi..-.  i nations,  words,  messages,  instructions,  etc.  What  does  this  rep¬ 
ertoire  consist  of?  It  consists  exactly  of  the  kind  of  'language'  we  commonly 
carry  around.  It  may  start  out  from  an  a  priori  description  according  to  ideal 
rules,  starting  from  English  letter  probabilities  and  word  probabilities,  and 
then  as  English  messages  are  studied,  in  a  Baysian  sense,  a  series  of  improve¬ 
ments  are  attempted  until  a  repertoire  is  developed  that  recognizes  more  equi¬ 
probable  units,  i.e.,  the  improbable  ones  are  lumped  into  larger  classes  to 
equalize  probabilities.  Decoding  studies  then  redistribute  the  probabilities 
until  a  group  of  high  equal  probability  supersymbols  exist,  and  another  small 
group  of  low  symbols  which  are  lumped  into  a  fev  supersymbols  in  toto.  It  is 
necessary  to  go  over  this  until  the  error  from  the  residue  of  low  probability 
supersymbols  is  satisfactory.  Suppose  this  is  1000  supersymbols.  (This  is 
only  an  illustration  though  it  likely  is  not  10,000.)  For  example,  the  ques¬ 
tion  of  how  does  a  company  take  in  $121,162,146.32  is  not  a  penny  at  a  time, 
but  *y  far  fewer  Diophantine  operations  such  as  $2.98  per  item,  and  a  with¬ 
holding  tax  oc  x  percent,  etc.  English  repertoire  is  limited,  and  most  metro¬ 
logical  or  'measure'  information  is  really  similarly  limited,  regardless  of 
how  many  digital  computations  are  done  as  the  difference  of  very  nearly  the 
same  large  numbers.  Knowing  the  1000  symbol  repertoire,  10  binary  digit  coding 
can  be  used.  This  is,  very  dense.  Every  10  place  symbol  is  used.  The  hemming 
distance  is  essentially  1.  Then,  does  Shannon's  theorem  apply? 

By  this  coding,  there  is  no  wore  latitude  for  using  10,000 
symbols.  The  repertoire  quite  compactly  inhabits  message  space,  the  aupersym- 
bols  are  equiprobable,  and  there  is  very  little  redundancy.  However,  with  a 
noisy  channel,  say  at  this  level  now,  a  few  percent  of  our  symbols  are  not 
transmitted  with  fidelity  -  regard  this  ’equivocation’  to  have  been  obtained 
experimentally,  not  probabilistic  by  individual  symbols.  Can  w«,  by  saving  10. 
20,  30  symbols  per  number  for  checking,  assure  the  accuracy  of  our  repertoire? 
Shannon's  theorem  says  yes.  How? 

We  will  illustrate  only  by  the  beginning  of  constructive  proc¬ 
esses.  Instead  of  using  a  ten  place  binary  number  for  coding,  use  a  twelve 
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place  -  or  fourteen  place  -  binary  number.  An  11  place  number  can  code  2000 
symbols.  We  can  code  the  1000  numbers  among  these  2000  symbols,  so  that  now 
they  are  not  so  dersely  distributed  In  message  space.  We  can  have  Increased 
their  hamming  distance  to  2.  As  a  simple  example  If 

000 

001 

010 

Oil 

100 

101 

no 

111 

00  -  A 

01  -  B 

10  -  c 

11  -  D 

is  a  dense  4  symbol  code,  among  the  8  symbol  code 

000  -  A 

001 

010 

Oil  -  B 

100 

101  -  C 

110  -  D 

111 

we  can  code  the  4  symbols  so  that  you  can  recognize  a  wrong  received  signal  if 
it  has  an  error  in  a  single  place. 

With  a  12  place  number,  we  can  code  i^e  1000  numbers  among 
4000  places  so  that  they  are  even  less  densely  distributed.  Gradually,  then 
for  such  sparce  spacing,  we  can  improve  a  sequence  of  correction  codes,  with 
the  hope  of  ultimately  finding  one  that  will  be  error  f  ee.  The  cost  in  trans¬ 
mission  rate  was  only  moderate  -  107,,  207,,  etc.  -  and  in  fact  Shannon's  theorem 
states  that  the  cost  does  not  have  to  be  greater  than  the  equivocation  rate, 
which  depends  on  what  percentage  and  distribution  of  errors  are  found.  Better 
results  are  then  obtained,  by  the  line  suggested,  in  higher  code  extensions. 

Details  on  'efficient'  codings  will  not  be  discussed  here.  A 
suitable  reference  is  Peterson  (50). 

However  one  should  note  the  strictures  of  the  various  authors. 
Pierce  (21)  for  example  points  out  that  to  correct  n  errors,  we  must  find  2M 
code  roups  each  at  a  distance  of  at  least  2n  +  l  from  every  other,  that  mathe¬ 
maticians  have  actually  found  the  best  codes,  that  the  general  problem  of  how 
to  produce  the  best  error-correcting  code  ^or  given  values  of  M  end  n  has  been 
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solved,  but  that  the  longer  and  more  efficient  of  these  highly  efficient  codes 
is  too  complicated  to  use,  and  the  simpler  codes,  correcting  only  one  error 
per  block,  don't  help.  For  example,  the  chief  source  of  interference  is  time 
dependent  bursts  that  cause  errors  in  several  successive  digits.  Hagelbarger, 
of  Bell  Labs,  has  shown  codes  which,  by  doubling  the  number  of  digits,  corrects 
up  to  six  adjacent  errors,  capable  of  simple  equipment  implementation.  This  is 
an  inefficient  but  useful  error-correction  method  in  contrast  to  the  codes  that 
are  efficient  mathematically  but  useless  in  engineering. 

As  Bell  (42)  indicates,  the  real  problem  is  to  fish  up  the 
answer  from  the  signal  plus  noise,  that  it  is  not  at  all  obvious  how  in  an 
electrical  system  one  carries  out  the  process  of  fishing,  which  has  the  salient 
requirement  of  'recognition,'  that  the  possibility  of  virtually  error-free  com¬ 
munication  depends  on  limiting  the  vocabulary  or  code-book  to  a  specific  en¬ 
semble  of  messages,  and  that  no  recognition  system  capable  of  decoding  by  a 
Shannon  model  has  been  constructed.  Since  it  is  a  requirement  that  message 
groups  be  very  extensive,  and  the  set  of  messages  be  very  large,  the  recogni¬ 
tion  need  is  extremely  onerous  and  probably  renders  ideal-coding  impracticable. 
"...  it  seems  that  the  difference  between  any  practically  realizable  communi¬ 
cation  system  and  a  Shannon  system  is  far  greater  than  the  difference  between 
a  practical  heat  engine  and  a  reversible  heat  engine."  However,  his  conclusion 
is  that  while  the  advantage  of  approximating  Shannon's  ideal  coding  is  not  very 
great  compared  to  the  complexity  of  required  apparatus,  good  r  suits  can  be  ob¬ 
tained  by  only  a  modest  sacrifice  of  signalling  speed  or  gain  in  signalling 
power.  However,  the  concept  of  'information'  as  a  measurable  quantity  of  a 
quantized  nature;  the  relation  between  bandwidth  and  signal  to  noise  ratio  owes 
a  lot  to  Shannon's  work,  and  it  has  led  to  many  other-than- ideal  embodiments.) 

31.  The  remainder  of  Shannon’s  original  theory  deals  with  process¬ 
ing  information  on  a  'continuous'  basis.  Recognizing  that  the  input  signal  - 
say  speech  -  has  a  frequency  band  limitation  fQ  and  an  amplitude  limitation  A; 
that  'white'  noise  (white  as  related  to  the  band  of  the  signal)  with  average 
power  N  exists;  that  if  both  noise  and  the  signal  ensemble  are  stationary  (in 
time)  with  ergodic  properties;  that  Wiener's  contribution  (17)  by  which  randomly 
selected  time  series  from  a  stationary  domain  which  are  to  be  transformed  by 
linear  'communications'  networks  can  be  treated  by  a  Fourier  theory  combined 
with  the  methods  ot  mathematical  statistics  furnishes  the  mathematical  back¬ 
ground  for  such  message  ensembles;  Shannon  defines  the  entropy  for  a  continuous 
distribution.  He  shows  that  the  pass  through  a  linear  'filter'  (simply  a  net¬ 
work  that  has  a  response  limited  to  a  given  band,  here  f0),  shows  an  entropy 
loss  that  depends  on  the  transfer  characteristic  over  the  frequency  band.  It 
Is  zero  for  a  rectangular  bandpass. 

If  signal  and  noise  are  independent,  so  that  the  rate  of  trans¬ 
mission  is  defined  as  the  entropy  of  the  received  signal  less  the  entropy  of 
the  noise,  snd  the  channel  capacity  is  defined  as  the  maximum  ot  the  entropy  of 
the  received  signal  less  the  entropy  o*'  the  noise,  then  maximizing  the  trans¬ 
mission  rate  requires  maximizing  the  entropy  of  the  received  signal.  Shannon's 
'third  theorem  on  ’Channel  Capacity  with  «n  Average  Power  Limitation*  comes 
about  In  the  following  manner.  lf  the  noise  Is  white  thermal  noise  of  power  N, 
and  power  transmitted  ia  limited  to  a  certain  average  value  P,  then  P  +  N  is 
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the  power  received.  The  maximum  entropy  received  exists  when  the  received 
signal  forms  a  white  noise  ensemble.  Then  the  received  entropy  is  given  by: 


H(y)  -  f0  log2  2  Tre  (P  +  N) 
H(n)  -  f0  log2  2  7re  N 

C  -  H(y)  -  H(n)  -  f0  log2  ~~ 


H(n)  - 

entropy  of  noise 

H(y)  - 

entropy  of  received  signal 

y  ■ 

received  ensemble 

c  - 

capacity 

The  essence  is  that  the  transmitted  signals  must  resemble  (not 
be)  white  noise  in  statistical  properties  in  order  to  achieve  this  high  .rate. 

As  Shannon  points  out,  similar  formulas  were  derived  by  Wiener 
(see  Wiener's  CYBERNETICS):  Tuller  (1949),  and  H.  Sullivan.  For  peak  power 
limitations  instead  of  mean  power  limitations  there  is  ever  greater  complexity. 

As  Pierce  points  out  (21),  the  Hartley-Shannon  relation 
C  -  f„  log2  (l  +|) 

is  used,  not  narrowly  to  tell  how  many  binary  digits  per  second  can  be  sent 
over  a  particular  channel,  but  to  tell  something  about  the  possibilities  of 
transmitting  a  signal  of  a  specified  bandwidth  with  some  required  signal  to 
noise  ratio  over  a  communication  channel  of  some  other  bandwidth  and  signal-to- 
nolse  ratio.  At  this  point,  thus,  information  theory  returns  to  the  communica¬ 
tions  theory  for  which  it  developed  and  the  books  on  communications  theory  and 
noise  have  greater  pertinence. 


3.  ASSESSMENT  AND  DISCUSSION 


In  summary  of  information  theory  in  the  network  on#  ml^ht  say  that  it  is 
kinematic  theory  of  coding  of  messages  drawn  from  a  stationary  universe  with  no 
particular  dlscemable  order,  in  which  they  undergo  the  kind  of  kinematic  trans 
formation  that  the  electrical  engineer  associates  with  the  linear  description 


of  electrical  transmission  networks  of  both  lumped  or  distributed  form,  and 
which  may  be  perturbed  by  what  the  electrical  engineer  has  kinematically 
idealized  as  stationary  noise  coupled  to  the  network  system  in  idealized 
fashion,  and  which  will  deal  with  the  subject  matter  of  the  messages  -  as  far 
as  possible  -  independent  of  content,  i.e.,  once  again  kinematically  idealized. 

The  idea  of  kinematics,  the  keynote  of  the  definition,  is  that  it  will 
deal  with  space-time  motions  independent- of  physical  forces.  The  subject  of 
the  physical  forces  and  'causality'  is  dealt  with  by  kinetics,  or  dynamics. 
(Webster:  "kinematics  -  of  motion  in  the  abstract;  kinematics  -  the  branch  of 
mechanics  that  deals  with  motion  in  the  abstiact,  without  reference  to  the 
force  or  mass.")  It  is  desirable  to  know  how  such  a  possibility  of  descrip¬ 
tion,  of  attempts  at  a  nearly  pure  'kinematic'  description  of  physical  phenom¬ 
ena  crept  in,  and  what  it  implies.  It  begins  with  the  classic  distinction 
between  large  signal  electrical  engineering  and  small  signal  electrical  engi¬ 
neering;  the  first  became  'power*  and  the  second  'communication.'  (See  Wiener 
(17)  ior  example.)  However  the  small  signal  problem  could  well  afford  to  use 
the  well  developed  theory  of  linear  differential  equations,  linear  transforma¬ 
tions,  and  the  linear  superposition  theorem.  As  these  results  became  embedded 
in  the  theory  of  algebraic  equations  -  notably  in  such  results  as  the  Nyquist 
plot  -  the  engineer  began  to  view  the  physical  AC  networks  much  more  by  the 
'location  of  its  roots,'  and  much  more  by  the  abstract  transformation  properties 
than  by  the  physical  system,  for  distributed  (ex.  the  P.H.  Smith  chart)  as  well  as 
lumped  systems.  This  culminated  in  Wiener's  filtering  theory,  which  now  brought 
the  entire  apparatus  of  mathematical  statistics  to  this  transformation  theory. 

It  was  then  a  plausible  extension  (we  are  not  belittling  its  brilliance) 
to  use  the  same  techniques  for  the  input  content  -  which  had  clearly  become 
data  processing  of  large  quantities  of  data. 

Is  there  anything  wrong  with  kinematic  treatment?  The  answer  is  no,  if 
there  is  a  large  routine  of  networks  that  are  sufficiently  described  by  such 
unitary  concepts  as  'the  roots  of  the  algebraic  equation'  that  describes  the 
transient  motion  of  a  lumped  network,  or  similar  impedance  matching  conditions, 
etc.  However  the  general  problem  is  coupling  to  other  systems  generally  through 
the  transport  properties  that  follow  from  the  'atomic'  nature  of  the  systems 
dealt  with,  and  the  'atomic'  nature  of  the  system  itself  which  often  limits  the 
ranee  over  which  the  system  can  be  described. 

Again,  by  the  brilliance  of  Nyquist  and  the  otner  coummications  'engi¬ 
neers,'  approximate  techniques  were  developed  for  ' linearizing'  the  problem  of 
coupling,  and  replacing  the  distributed  nature  of  the  'atomicity'  effects  by 
their  major  effects  as  a  lumped  element.  Thus  the  communications  engineer 
learns  that  the  main  source  of  noise  limiting  an  amplifying  system  is  the  Input 
state  Johnson  noise  because  it  undergoes  the  greatest  amplification.  An  entire 
routine  sequence  of  'equivalent  network'  constructions  is  gradually  developed  by 
which  he  represents  the  system  by  'block  diagrams'  in  which  a  conventional 
idealized  geometric  'picture'  or  scheme  or  relations  is  proposed  for  the  coupl¬ 
ing  and  transform  effects  of  various  elements.  For  'passive'  elements,  this 
has  the  defect  that  the  elements  are  idealized  and  simplified  as  to  their  trans¬ 
formation  response.  For  active  eleu*ents,  what  emerges  is  nonsense.  For  simple 
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active  elements  -  a  D.C.  battery,  an  'amplifying'  tube  used  over  a  small  range, 
a  considerable  number  of  other  elements  -  this  gets  by.  Generally  (we  have 
this  by  sample  data  from  electrical  engineers)  the  empirical  result  is  observed 
to  see  whether  it  gets  by.  Today,  the  empiricism  is  often  tested  by  analogue 
computer  models  over  an  estimated  range  of  pertinent  variables.  The  basic  bug- 
a-boo  being  tested  for,  generally,  is  stability  and  optimalization. 

Subject  to  these  empirical  tests,  networks,  logic  circuits,  more  general 
switch  and  computer  circuits,  coding,  decoding,  etc.  are  designed  for  with  these 
'kinematic'  ideas.  For  example,  information  theory  first  proposed  to  deal  with 
economical  language  information  transmission  devoid  of  content.  Subsequently, 
it  proposed,  by  a  series  of  extending  maneuvers,  to  bring  in  more  economical 
language  transmission,  only  by  form  and  not  content,  by  empirical,  essentially 
analogue,  computer  studies,  to  find  out  the  language  statistics  of  two,  three 
or  more  letter  chains,  of  words,  messages,  etc.,  again  seeking  purely  'kine¬ 
matic'  descriptions. 

This  avoids  the  fundamental  problem  that  by  discovery  of  the  linear 
equivalent  transform  -  whether  by  step  function,  pulse  function,  sine  frequency 
response,  correlation  techniques  from  operating  records  -  you  may  be  able  to 
uniquely  characterize  the  linearly  equivalent  network  or  blocx  diagram  for  a 
domain  of  space  and  time,  but  you  cannot  establish  the  most  generally  equi¬ 
valent  non-linear  network  that  has  the  empirically  discovered  properties.  In 
other  words  you  cannot  treat  these  fields  as  equivalent  boundary  value  prob¬ 
lems  embedded  in  linear  theory,  and  in  fact  the  'chains'  of  connectivity  and 
coupling  that  you  propose  may  not  even  be  causally  correct.  (The  old  saws 
about  rice  in  China  and  its  correlants,  etc.  are  avoided  by  most  people  for 
their  relevance  here.)  This  is  particularly  noteworthy  today  in  the  complete 
loose  use  made  of  the  concept  of  feedback,  and  controlled  variables,  say  in 
such  difficult  systems  as  the  biological  system. 

What  is  at  stake  are  the  causal  chains  known  as  physical  laws.  Typically, 
a  physical  causal  chain  as  it  might  exist  in  a  complex  system  (the  author  has 
recently  done  this  for  the  hydrodynamic  field)  involves 

equations  of  exterior  motion  -'the  equations  of  motion' 

equations  of  interior  motion  -'the  thermodynamic  equations  of  change' 

continuity  equations  -’the  equations  of  conservation  of  mass.' 

This  may  lead  typically  to  an  n-equation  set.  (For  example,  the  author 
has  explored  a  5-equation  set  for  turbulence  and  shown  that  stability  results 
are  to  be  associated  with  an  8th  order  complex  differential  equation.)  The 
solution  of  these  sets  can  then  reveal  the  nature  of  stability,  and  the  nature 
of  how  the  various  elements  are  coupled. 

Generally,  in  tackling  such  a  complex  problem,  very  simple  boundary  con¬ 
ditions  must  be  accepted.  Nyqulst,  for  example,  assumed,  In  reality,  a  bounding 
cavity  with  isothermal  walls  in  order  to  discuss  a  dynamic  equilibrium  result 
known  as  Johnson  noise.  In  such  complex  problems,  a  kinematic  description 
generally  emerges  from  the  response  complex  as  a  natural  nearly  obvious  result. 
One  may  give  Shannon  credit  for  forcing  the  results  Independent  of  th?  network 
analysis;  however,  it  doesn't  improve  the  status  of  network  science. 


The  general  characteristics  that  emerge  from  such  a  complete  analysis 
are  that  the  system. can  show  both  internal  and  external  -  in  general  oscil¬ 
latory  -  equilibria;  that  th^se  states  would  result  from  driven  inputs,  from 
self-generated  limit  cycles,  and  from  any  assumed  underlying  active  'atomicity. 

In  the  electrical  network,  this  has  been  simply  disposed  of  by  regarding 
the  boundary  drive  as  'signal';  by  regarding  limit-cycles  as  'instability' 
generally  to  be  avoided,  except  in  the  most  recent  sophisticated  techniques 
as  in  'bang-bang'  art,  adaptive  systems  art,  or  computer  control  art;  and  by 
regarding  only  simple  'atomistic'  models  for  internal  noise  and  noise  that 
drifts  in  from  external  sources.  Our  main  criticism  is  in  the  substitution 
of  linear  coupling  for  unproved  couplings  of  either  a  linear  or  non-linear 
nature. 

Thus  validly  the  physics  of  'nois*'  is  pointed  up  in  Moullin  (1938), 
Lawson  (1949),  Bennett  (1960),  Bell  (1960).  It  stems  from  Einstein's  work, 
that  began  on  Brownian  motion.  It  is  to  Nyquist's  credit  that  he  brought  the 
ideas  to  electrical  networks.  It  is  to  the  credit  of  Wiener  and  Shannon  that 
they  developed  its  limit  on  signal  transmission. 

However  the  electrical  engineer  does  not  have  the  correct  general  model 
of  an  equivalent  network  element  (the  R,  C,  L,  with  an  AC  and  DC  source,  with 
an  external  noise  source  connected  somewhat  arbitrarily).  The  'proof'  of  this 
statement  is  that  he  cannot  so  represent  an  elementary  flow  element  of  a  tur¬ 
bulent  field,  whereas  he  can  for  a  laminar  flow  field.  The  point  we  are  making 
here  is  that  the  elementary  element  may  be  linearly  unstable  and  not  construct- 
able  out  of  linear  elements  without  non-suches  like  negative  resistances,  etc. 

Thus  while  practice  may  still  use  linear  network  theory  for  electrics) 
networks,  for  coupling  of  elastic  elements  in  an  airplane  or  automobile,  for 
coupling  chains  in  an  election,  for  economic  input-output  tables,  for  hydro- 
logical  or  meteorological  networks,  for  hormone  interaction,  etc.,  the  physical 
scientist  must  seek  to  develop  more  plausible  'causal'  chains  that  relate  the 
real  parametric  degrees  of  freedom  of  a  system;  he  must  try  to  come  up  with 
better  diagrams  of  how  and  where  the  limiting  factors  are  that  produce  'error,' 
'uncertainty,'  'limiting  sensitivity,'  or  'noise'  in  real  systems  and  how  they 
may  be  described;  and  he  must  try  to  synthesise  the  response  of  these  systems 
to  desired  boundary  changes  known  as  cohesive  signals  to  help  give  them  metro¬ 
logical  'meaning.'  These  are  the  tasks  by  which  he  can  enrich  and  deepen  the 
results  needed  by  the  engineer  for  information  transmission  in  the  general 
systems  network. 

One  significant  ingredient  to  be  noted  is  what  we  have  referred  to  as  an 
interacting  or  non-interacting  property.  There  is  a  significant  difference  be¬ 
tween  the  networks  in  which  the  signal  passas  without  mich  power  interaction 
with  the  level  of  power  Involved  internally,  and  that  in  which  the  signal 
sources  ere  heavily  involved.  Current  analyses  do  not  distinguish  these  two 
cases. 
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CLASS  2  PROBLEM  -  INFORMATION  THEORY  FOR  HUMANS 

1 .  SOURCE  MATERIAL 


Cherry  (49)  is  a  good  transition  source  from  the  first  type  of  problem 
to  the  present,  second  type.  One  may  also  inspect  Brillouin  (41),  and  Pierce 
(21)  for  some  further  introductory  ideas.  It  is  then  useful  to  comb  the  London 
symposia  on  Information  Theory  (52)  held  in  1950,  1952,  1955  and  1960.  There 
are  three  Prague  symposia  (53)  in  1956,  1959,  and  1962,  heavily  mathematical. 
There  ia  the  1958  OSR  symposium,  edited  by  Taube  and  Wooster  (54).  There  is 
the  National  Academy  of  Sciences  Conference  (55)  in  1958.  A  more  specialized 
symposium  was  held  on  machine  translation  (56)  in  1960,  or  on  character  rec¬ 
ognition  (57)  in  1962.  While  far  from  complete,  such  sources  are  an  apt 
beginning. 

It  appears  likely,  from  cursory  review,  that  the  content  of  this  second 
type  of  problem  gradually  has  become  defined  out  of  the  interests  assembled  at 
the  early  London  symposia  on  information  theory  (52).  It  is  likely  due  to  the 
enthusiasm  and  interests  of  the  organizers,  and  their  wise  choice  of  invitees 
that  helped  create  such  a  diverse  interdisciplinary  problem  base  for  the  sub¬ 
ject.  It  may  thus  perhaps  be  most  useful  to  briefly  trace  the  threads  that 
have  emerged  within  this  subject. 


2.  INFORMATION  THEORY  IN  THE  NETWORK 


One  extension  of  information  theory  in  the  network  -  which  might  have 
been  a  division  in  Nyqulst's  mind  which  led  him  to  two  separate  directions, 
one  to  define  noise  and  its  connection  with  statistical  mechanics  in  the  net¬ 
work  (there  obviously  were  other  workers,  this  characterization  is  for  the 
quality  of  the  problem),  the  other  to  define  the  kinematics  of  information 
transmission  -  was  furnished  by  Brillouin  (4 l)  whose  1956  SCIENCE  AND  INFORMA¬ 
TION  THEORY  attempted  to  resynthesize  these  two  directions.  He  sought  to  tie 
Shannon's  'entropy'  concept  back  to  physical  entropy.  For  example,  In  his 
aunsnary,  "Information  and  phyalcal  entropy  are  of  the  same  nature.  Entropy 
is  a  measure  of  th**  lack  of  detailed  Information  about  a  phyalcal  syatem. 

The  greater  ia  the  information,  the  emaller  will  be  the  entropy.  Information 
represents  a  negative  term  in  the  entropy  of  a  system,  and  we  have  atated  a 
nagencropy  principle  of  information."  Brillouin  further  points  out  "The  origin 
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of  our  modern  ideas  about  entropy  and  information  car.  be  found  in  an  old  paper 
by  Szilard  (1929),  who  did  the  pioneer  work  but  was  'ot  well  understood  at  the 
time.'* 


There  it  little  doubt  that  Shannon's  and  Brillouin’s  works  made  the  con¬ 
cept  of  ’entropy’  fashionable  at  the  philosophic  tails  of  most  scientific 
disciplinary  consideration.  All  such  discussion  we  have  heard  (the  latest, 
for  example,  was  G.  Sacher’s  discussion  of  the  representation  of  the  causes  of 
biological  mortality  during  the  week  of  January  16,  1966  in  a  New  York  Academy 
conference  on  Prospectives  in  Time)  has  represented  provocative  thinking  and 
groping;  however,  we  do  not  yet  lave  any  real  assurance  that  it  has  represented 
an  operationally  useful  posture.  The  question  still  remains  open.  The  paths 
from  information  theory  in  a  general  network  to  statistical  mechanics  of  a 
general  system  remain  open  through  this  work. 

There  is  little  doubt  that  information  theory  in  the  network  furnished  a 
fruitful  point  of  view  -  and  likely  was  stimulated  by  the  same  scientific  time¬ 
liness  -  in  the  computer  development.  One  may  note  early  in  the  information 
science  conferences  the  continuing  sustained  interest  in  computer  aspects  of 
coding,  checking,  etc.  (to  mention  a  few,  Bell  Labs,  University  of  Illinois, 
MIT,  Bureau  of  Standards,  Remington-Rand,  etc.).  It  is  outside  the  scope  of 
this  report  to  track  the  computer  technology  explosion  in  the  information 
sciences  from  1950  onward.  The  reader  is  referred  elsewhere.  Without  such 
study,  one  might  hazard  a  guess  that  a  considerable  amount  of  development  of 
such  information  went  ’under  wraps'  as  commercial,  security,  and  contractual 
advantage  was  developed  and  milked  from  the  field.  More  reliable  judgment® 
would  require  much  deeper  exploration. 

The  Impact  in  this  area  emerges  in  such  detailed  information  theory  ma¬ 
terial  as  Reza  (44),  in  a  philosophic  view  of  'information  content'  and  the 
physical  network,  in  computer  philosophy,  and  in  the  introduction  of  stochastic 
mathematics  to  the  'deterministic*  network.  Though  the  latter  view  has  not 
been  stressed,  considerable  mathematics  has  developed.  (A  highly  abstract 
source  such  as  Vitushkin's  THEORY  OF  THE  TRANSMISSION  AND  PROCESSING  OF  INFORM¬ 
ATION,  Permagon,  1961,  or  (53),  or  the  commonness  with  which  source  books  on 
stochastic  processes  are  referenced  in  this  literature  well  attests  to  this.) 

Examples  of  the  more  detailed  problems  that  the  communications  engineer 
began  to  face  are  contained  in  the  papers  of  Marcou  and  Daguet,  Lickllder, 
Allanson  and  Whitfield,  and  Gregor)7  in  (52). 

The  transition  to  problems  other  than  the  statistical  properties  of 
communications  may  be  noted  in  (52)  in  papers  by  loeb,  Fry-Denes,  and  Davis  et 
al  that  begin  to  probe  at  and  elicit  response  on  th**  problem  of  pattern  recog¬ 
nition  (such  papers  as  Valensl  on  coding  color  for  the  normal  eye,  or  Huggins 
on  characterizing  the  dynamics  of  the  ear  through  its  structure  have  been  part 
of  the  identification  of  either  the  phenomenological  mechanisms  or  the  charac¬ 
teristics  of  such  sensory  end-puts  as  vision,  hearing,  or  speech,  traditionally 
part  of  communications  engineering);  and  the  formidable  beginning  by  Bar-HUUl 
and  Carnap  to  tear  the  problem  away  from  the  statistical  properties  of  signs 
to  the  deeper  problem  of  semantic  meaning  of  the  'signs'  of  language. 
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3.  PATTERN  RECOGNITION 


The  attributes  of  'pattern'  or  'form'  extended  beyond  the  question  of 
simply  coding  letters,  or  words,  or  even  sounds.  It  is  proper  to  mention 
Helmholtz,  Alexander  Graham  Bell,  Fletcher  and  Dudley's  1936  Vocoder  (to  men¬ 
tion  3  few  sources  popular  in  America)  to  indicate  a  more  complex  Interest  in 
'form'  -  here  of  sound  -  and  'communications,'  mostly  telephonic.  Such  prob¬ 
lems  have  come  to  youthful  maturity  in  Gabor's  work  (1946  onward)  on  the 
structural  aspects  of  communication.  There  may  exist  a  basic  signal  element, 
into  which  complex  signals  s**ch  as  speech  (speech,  surprisingly,  represents 
an  overly  elementary  example)  may  be  analyzed,  which  is  both  finite  in  fre¬ 
quency  and  time.  This  is  the  'atomistic  element'  or  the  'unit  of  structural 
information'  of  an  information  theory.  It  was  referred  to  by  Gabor  as  a 
'logon.'  Gabor  extended  this  concept  to  optical  signals  in  (52),  and  the 
papers  by  Meyer-Eppler  and  Darius  begin  to  tie  the  information  in  visual  sig¬ 
nals  together .with  statistical  correlation  techniques,  and  with  the  information 
about  symmetry  known  in  crystallography. 

The  branch  that  begins  pattern  recognition  on  a  theoretical  foundation 
is  perhaps  the  1947  paper  of  Pitts  and  McCulloch  on  "How  We  Know  Universals, 
the  Perception  of  Auditory  and  Visual  Form,"  and  the  1959  Lettvin,  Maturana, 
McCulloch,  Pitts  paper  "What  the  Frog's  Eye  Tells  th>  Frog's  Brain." 

While  the  physical  ideas  are  all  quite  profound  and  have  had  a  long  his¬ 
tory,  it  was  elementary  papers  such  as  these  that  began  the  real  theoretical 
construct  of  what  is  the  nature  of  human-like  information  in  the  brain,  and 
what  'patterns'  of  form  and  function  the  brain  recognizes.  (A  1965  paper  of 
S.  Sherwood  in  the  same  source,  the  Bulletin  of  Mathematical  Biophysics,  in¬ 
dicates  that  the  question  of  how  it  is  done  still  remains  open.) 

Recognizing  this  basic  point,  one  may  trace  what  has  been  done  in  pat¬ 
tern  recognition  in  large  theoretical,  experimental,  and  practical  hardware 
construction  and  development.  Examples  are  Selfridge  in  (52);  scattered  dis¬ 
cussion  in  Cherry  (51)  (who  proposes  Charles  Peirce's  writings  as  a  good 
beginning  philosophic  source);  or  the  extensive  Perceptron  development  by 
Rosenblatt  (see  for  example  (58)).  A  measure  of  practical  development  can  be 
seen  in  (57).  We  find  the  practical  work  described  by  Rabinow  and  by  Fitz- 
maurice  quite  interesting.  Work  at  MIT  is  alluded  to  in  Roberts'  paper.  With 
our  personal  knowledge  of  a  number  of  the  authors,  we  can  accept  Rabinow' s  in¬ 
troduction  "We  think,  in  our  company,  that  we  can  read  anything  that  is  printed, 
and  we  can  even  read  some  things  that  are  written.  The  only  catch  is,  'how 
many  bucks  do  you  have  to  spend',"  or  Murray  Eden's  beginning  work  (52),  1961, 
on  the  "Characterization  of  Cursive  Handwriting”  which  indicates  that  deter¬ 
ministic  rules  applied  to  known  or  recognizable  phenomena  can  extract  its  in¬ 
formation  content  by  mechanistic  rules  withou-  great  error.  It  is  clear  that 
such  large  cost  problems  as  the  Post  Office  read-out  problem,  or  handling 
Russian  information  provided  sufficient  fund  impetus  for  the  large  scale 
practical  development  of  optical  scanning  of  words.  It  is  obvious  that  pat¬ 
tern  recognition  in  photographs  (particularly  with  new  theoretical  constructs 
and  computer  assistance)  •  that  played  such  a  notable  role  in  the  Cuban 
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crisis  and  in  spy  and  searching  satellites  -  has  proceeded  to  an  extremely 
sophisticated  art.  Again  the  reader  must  be  referred  to  other  sources  not 
known  by  us. 

The  article  by  Barus  in  (57)  is  on  a  problem  from  a  mors  general  class  - 
to  recognize  pattern  information  where  the  pattern  or  its  statistics  are  un¬ 
known  to  the  designer.  It  is  essentially  assumed  that  the  unknown  patterned 
'language'  is  drawn  from  a  source  so  as  to  form  a  stationary,  ergodlc  se¬ 
quence,  as  far  as  samples  are  concerned.  To  what  degree  suen  efforts  have 
proceeded  meaningfully  is  not  yet  known.  It  has  led  to  still  another  direc¬ 
tion  of  learning  machines,  to  which  the  reader  will  again  have  to  be  directed 
separately,  That  routines  for  simple  kinds  of  learning  machines  (i.e.,  to 
teach  members  of  a  stationary  population  how  to  learn)  can  be  developed  is 
obvious. 

In  summary  it  appears  that  recognition  from  a  stationary  information 
source  or  for  a  stationary  population  is  a  deterministic  problem,  that  the 
problem  is  generally  solved  by  simply  examining  or  testing  any  hypothesis 
experimentally  to  sec  if  it  will  work.  As  long  as  the  sensory  type  detec¬ 
tors  are  involved  -  electromagentic  spectrum;  mechanical-acoustic  spectrum; 
to  a  lesser  extent,  codable  chemical  compound  spectrum  -  it  may  be  expected 
that  such  problems  lend  themselves  'quickly'  -  with  money  -  to  practical 
solution.  The  problems  that  remain  are  those  which  we  cannot  well  categorize 
or  where  ve  have  not  yet  been  well  able  to  distinguish  signal  and  noise,  such 
as; 


Pattern  recognition  of  movement  in  a  somewhat  non-stationary 
universe  (the  class  of  problem,  different  from  what  was  treated  by 
Wiener,  that  was  brought  up  in  1927  by  G.  Udney  Yule,  or  in  1940  by 
Jeffries).  A  typical  example  is  the  movement  of  the  economy. 

Pattern  recognition  in  complex,  loose,  non-linear  systems, 
like  the  brain,  or  in  recorded  human  information. 

We  do  not  consider  the  solution  of  pattern  recognition  in  these  problems 
to  be  very  difficult,  but  only  time  consuming,  somewhat  expensive  (but  not  in¬ 
ordinately  so),  and  not  yet  'recognized'  by  society  as  being  significant. 

An  illustrative  highly  abstruse  paper  on  the  subject  is  D.  Brick, 

"Pattern  Recognition  ..."  in  the  1965,  Volume  17,  Progress  in  Brain  Research 
series  on  CYBERNETICS  OF  THE  NERVOUS  SYSTEM.  His  references  embed  thi  sub¬ 
ject  well  in  the  theoretical  speculations  that  have  been  brought  to  this  field. 


4.  THEORY  OF  MEANING 


Shannon  avoided  the  option  of  treating  the  problem  of  meaning,  the  prob¬ 
lems  associated  with  which  have  been  of  traditional  philosophic  concern.  How¬ 
ever  even  if  the  subject  is  not  treated,  philosophers,  linguists,  and  many 
others  will  get  caught  up  in  it. 


For  example,  the  December  17,  1965  issue  of  the  New  Statesman  has  a 
review  article  on  the  foundations  of  academic  teaching  of  English  literature 
in  England.  It  comes  as  quite  a  surprise  that  such  teaching  began  only  in 
1828  and  that  the  difficult  problem  was  to  Include  "in  theoretically  equal 
proportions  the  study  of  English  as  language  and  as  literature"  ("though  the 
syllabus  was  in  fact  grotesquely  overweighted  linguistically").  Thus  the  hu¬ 
man  brain,  in  its  most  rat  .onal  'normal*  state  seeks  to  identify  something  in 
signal  content  other  than  its  'form'  (a  schizophrenic  tendency,  to  which  poets 
are  also  addicted)  and  seeks  to  identify  'meaning.1  We  Intend  no  Implication, 
either  cynical  or  purely  fatuous.  It  simply  points  out  that  the  problem  of 
meaning  is  present,  in  all  fields,  at  all  times,  and  requires  an  extremely 
large  discussion  to  do  it  justice.  We  will  only  touch  on  it  lightly. 

Cherry  (51)  refers  to  Von  Frisch  (animal  communication  by  signs  with¬ 
out  language),  .T.B.S.  Haldane,  A.N.  Whitehead,  Kurt  Levin  (for  inspiration  on 
network  theory  in  psychology),  Dalgarno  (on  classification  of  ideas), 

Descartes  and  Leibnitz  (on  possible  reasoning  machines),  de  la  Mettrie  (on 
the  faculty  of  thinking),  Locke  (on  ideas),  Mackay  (on  the  elementary  quantal 
and  metric  nature  of  Information),  Pierce  (on  meaning),  Ogden  and  Richards 
(59),  Monboddo  (on  language),  Bloomfield  and  Block  and  Jakobson  (authors  on 
language  from  the  linguist's  point  of  view  -  'phonemes'),  Zipf  (language 
statistically  viewed);  and  Carnap  (syntax  for  Wicians,  "pure  semantics  ... 
is  entirely  analytic  and  makes  no  reference  to  real  personal  experience  or 
real  facts  about  the  world.  ...  Syntactical  truth  should  be  distinguished 
from  experimental,  factual,  plain  truth"  is  quoted  by  Cherry);  Quine,  Bar- 
Hillel,  Z.  Harris  (these  last  authors  are  all  involved  in  the  language- logic 
arguments),  Ampere  and  Bentham  (logical  classification  of  knowledge  by  suc¬ 
cessive  dichotomies),  J.S.  Mill,  Weaver  (in  Shannon-Weaver's  book),  Descartes 
(the  dual  inner-outer  world),  Popper  (language  and  the  mind-body  problem) 
and  Von  Neumann. 

We  can  use  these  bits  for  a  beginning.  Cherry  points  out  that  the 
Wiener- Shannon  statistical  theory  of  communication  concerns  only  signs.  This 
limitation  satisfies  only  the  problem  of  the  communications  engineer  on  how 
to  design  immediately.  A  broader  question  arises,  embedded  in  the  classical 
philosophic  problem  of  a  theory  of  knowledge.  Whereas  this  could  be  considered 
previously  in  the  time  domain  of  2500  years,  now  it  has  become  a  matter  of 
urgency  in  the  time  scale  of  10-20-30  years.  What  does  such  philosophic  ques¬ 
tions  have  to  do  with  real  decisions  on  important  matters?  We  can  only  point 
out  once  more  that  science  and  technology  have  again  run  into  the  philosophic 
impasse  and  society  is  ready  to  pay  for  the  solution.  (A  recent  translation 
from  Atlas,  Novetober  1965  from  Yunost,  Moscow  by  Y.  Sbcherb-1'  on  scientific 
inquiry  quotes  the  French  newspaper,  Paris-Soir,  in  1937  on  the  atomic  nucleus 
"Our  scientists  are  undoing  themselves;  instead  of  occupying  their  time  with 
real  problems,  they  are  busy  making  esoteric  observations  in  connection  with 
atomic  energy.  Instead  of  flying  in  the  clouds,  they  would  do  well  to  estab¬ 
lish  closer  contact  with  the  earth  and  to  busy  themselves  with  tangible 
matters."  A  scientist  tackling  the  'theory  of  knowledge'  can  have  even  greater 
apprehension.) 


Basically,  logic  has  been  frozen  at  the  level  of  the  Aristotelian  con¬ 
cept  for  over  2000  years.  A  revolution  took  place  in  the  last  centrry  and 
the  mathematical  foundations  for  a  new  theory  of  logic  was  laid.  For  a  good 
beginning  source,  we  refer  to  an  'elementary,'  but  sharply  summarizing  source, 
Cohen  and  Nagel  (60).  For  the  enfolding  beyond  this  introduction,  one  can 
refer  to  Cohen  (61),  or  Nagel  (62), 

The  whole  development  of  a  static  philosophy  of  knowledge  -  which  is  so 
ably  presented  in  Cohen  and  Nagel  -  represented  the  main  chain  of  western  de¬ 
velopment  of  philosophy.  It  is  a  categorical,  hierarchical,  dichotomous 
philosophy.  Its  epitome  has  been  the  development  of  a  two  valued  logic.  (In 
the  end,  it  has  been  the  guide  to  the  empiricism  of  the  Shannon  theory  of 
Information.  For  those  who  will  wonder  if  there  is  necessity  for  anything 
to  go  beyond,  we  can  refer  to  a  recent  talk  by  an  eminent  logician,  G.  Gunther, 
connected  with  the  computer  developments  at  the  University  of  Illinois,  given 
at  the  New  York  Academy  meeting  on  the  Perspectives  of  Time,  January  17-20, 

1966.  Gunther  pointed  out  again  and  again  that  the  mind -body  problem  cannot 
be  pushed  into  an  ontology  with  two  values.  As  a  simplistic  example,  the  mind 
encompasses  the  universe,  the  universe  Includes  the  mind,  but  the  mind  is 
still  not  equivalent  to  the  universe.  It  is  such  problems  that  have  beset 
the  computer  designer  in  his  search  for  a  more  nearly  'thinking-machine';  it 
has  also  been  Interesting  to  bionics.) 

Another  doctrine  which  has  emerged  was  the  Heg.^lian-Marxian  dialectic, 
which  attempted  to  deal  in  a  mystical  way  with  the  problem  of  being  and  becom¬ 
ing  by  asserting  a  means  by  which  values  at  one  hierarchical  level  might  trans¬ 
form  into  another.  Its  defect  was  its  metaphysical  and  timeless  nature. 

(Those  of  us  exposed  to  M.R.  Cohen  were  well  aware  of  his  incisive  tongue  in 
debating  the  Marxian  dialectic.) 

Another  doctrine,  of  which  we  are  ignorant,  is  the  eastern  views  of 
nature.  (Ue  can  refer  to  the  writings  of  Dr.  Siu,  THE  TAO  OF  SCIENCE,  or  more 
recently,  we  have  been  urged  to  read  the  Chinese  classic  I  CHING  (Dover,  1963) 
by  an  engineering  friend,  H.  Ziebolz,  who  is  now  in  Tokyo  earnestly  attempting 
to  straddle  two  civilizations,  with  the  competence  to  achieve  some  success.) 

What  has  emerged,  in  the  last  century,  is  a  statistical  view  of  nature. 
The  statlonarlty  of  processes  -  in  a  stochastic  sense  -  arose  in  material  de¬ 
veloped  by  Pascal,  Gauss,  Bernoulli,  Mendel,  Planck,  Darwin,  Malthus, 

Gompertz,  Einstein,  Gibbs,  Bohr,  Fisher,  Markov,  which  suggests  a  few  of  the 
famous  problem  areas.  Probaballstlc  logics,  mathematics,  and  theories  of 
knowledge,  including  scientific  theory,  were  thus  born  and  highly  cultivated 
(Nagel  is  a  good  source  for  such  introduction,  either  in  (62)  or  in  Newman). 
There  is  little  doubt  that  the  views  of  Wiener  and  Shannon  that  led  to  an  in¬ 
formation  theory  stemmed  from  this  line. 

However,  what  is  missing  is  the  classical  physical- dynamic  view  that 
can  perhaps  deal  in  an  isomorphic  way  with  the  problem  posed  by  the  explanation 
of  form  and  function,  without  becoming  Involved  in  a  tricky  metaphysical  dia¬ 
lectic.  Having  asserted  this  theme,  we  may  return  to  the  earlier  views  by 
which  statics  and  statistics  were  merged. 
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Cherry  offers  Morris  (63)  as  a  good  source  for  discussion  on  a  the  > 
of  semiotics ,  a  theory  of  signs,  which  are  the  basis  for  communications. 
According  to  Pierce,  a  sign  should  be  capable  of  evoking  responses  which 
themselves  are  capable  of  acting  as  signs  for  the  same  designated  object. 
Semiotics  has  three  levels;  syntactics  -  the  study  of  signs  and  their  rela¬ 
tions;  semantics  -  the  study  of  the  relations  between  signs  and  the  des¬ 
ignated;  pragmatics  -  the  study  of  relation  of  signs  and  users.  These 
overlap.  These  three  levels  concern  signs  and  relations,  or  rules.  The 
rules  Are  not  inherent  in  the  language  and  thus  require  a  metalanguage  (thus 
the  mind-body  problem  sneaks  in).  Syntactics,  or  language  as  a  calculus,  is 
embedded  in  semantics  which  abstracts  the  content  of  signs  and  things,  which 
is  embedded  in  the  real-world-real-life  problems  level.  Logic  and  life  are 
thus  not  coextensive.  "Pragmatic  questions  cannot  be  discussed  in  terms  of 
syntactics  or  semantics."  ' 

(At  this  point  we  are  ready  to  join  battle  for  new  ideas.  To  do  this, 
we  will  have  to  tackle  the  third  class  of  problems  -  i.e.,  the  nature  of  the 
brain.  In  (64),  p.  10-26,  we  proposed  a  primitive  model  of  the  brain.  It 
can  be  summarized  simply  as  follows: 

The  'purpose'  of  the  brain  (i.e.,  teleology,  or  the  answer  to  what  the 
brain  does)  is  that  it  transforms  knowledge  of  its  present  input  state,  and 
a  suitable  number  of  derivatives,  and  of  all  of  its  past  states  (i.e.,  it 
possesses  a  hereditary  property)  to  transfer  these  'inputs'  Into  an  output 
state  (thus  making  it  a  complex  transducer),  in  which  action  is  deferred  or 
suspended  on  the  basis  of  an  internal  computer  with  logic  and  memory  (i.e., 
a  computer  controlled  transducer)  in  which  there  is  a  guiding  algorithm  which 
optimalizes  one  or  more  overall  properties  of  'advantage'  to  the  system. 

"Knowledge'  is  then  both  the  measures  of  present  inputs,  past  Inputs, 
of  evoked  computer  action,  and  of  the  deviations  from  an  optimalized  dynamic 
state.  It  does  not  Include  the  guiding  algorithm. 

The  key  words  are; 

input-output  transformation 
memory  of  past  Inputs 
evoked  computer  response 
the  deviation  from  optimal 
optimalizing  algorithm  complex. 

Thus,  we  'learn'  the  number  1,  psycho-logically,  not  logically  as  the 
class  of  all  elements  that  present  one,  but  as  the  very  much  more  limited 
class  of  examples,  ordered  in  time,  by  which  we  each  individually  learned  the 
number  one,  etc.  fur  all  numbers.  We  always  perform  an  induction  that 
jumps  from,  I  know  one  example,  I  know  two  examples,  I  know  three  examples, 
to  I  know  'infinite'  examples.  In  terms  of  (64),  we  generalised  by  locking 
into  an  analogue  of  the  number  that  henceforth  would  serve  us  -  unless  the 
analogue  received  moderate  correction  later  in  time.  This  was  the  'abstract 
ideal'  that  psychologically  would  serve  us  henceforth.  As  we  got  mure  sophis¬ 
ticated,  we  would  begin  to  develop  these  ego  ideals  into  more  perfect  logical 
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games,  called  various  extensions  of  number  and  branches  of  mathematics.  We 
are  not  prepared  at  this  time  to  lay  down  the  'lawo'  of  formation  of  all  the 
primitive  games  of  mathematics,  although  we  can  enunciate  and  enumerate  quite 
a  few. 


However  we  are  prepared  to  defend  and  expand  on  the  thesis  that  the 
'brain'  of  the  complex  biological  system  recognizes  and  idealizes  number, 
category,  sign,  symbol,  etc.  by  a  variety  of  ego  ideal  analogues  held  in 
memory  by  the  brain.  This,  plus  the  outisde  world,  is  the  stuff  that  'prag¬ 
matic'  reality  is  made  of.  However,  we  do  not  take  seriously  any  discussion 
of  man  and  the  world  in  purely  formalistic  terms.  We  shall  always  be  view¬ 
ing  the  dynamic  physical  problem  of  what  it  is  that  the  physiological- 
psychological  mechanisms  in  the  body  are  doing  in  response  to  any  question 
like  "What  is  it  that  a  man  knows,  and  how  is  it  that  he  does?") 

The  Wiener-Shannon  theory,  dealing  only  with  signs,  as  particulars 
drawn  from  a  general,  lies  at  the  syntactical  level,  and  therefore  within  and 
basic  to  semantic  or  pragmatic  aspects  of  information.  It  does  not  concern 
meaning. 

(Here  ve  take  issue  with  Cherry.  It  is  the  sense  that  the  human  can 
change  the  base  of  syntactic  communication  using  pragmatic  ' me ta- language' 
cues  that  casts  doubt  on  the  embedding  of  syntactic  information  within  prag¬ 
matic.  The  next  few  information  theory  problems  we  will  discuss  are  embedded 
in  the  syntactic,  semantic  levels;  yet  our  thesis  over  and  over  again  is  that 
it  is  the  content  of  the  pragmatic  ' me ta- language'  mode  of  the  human,  which 
is  not  meta-language  if  you  get  to  understand  the  human,  which  governs  inform¬ 
ation  transmission.  Thus  our  criticisms  will  not  come  into  full  focus  until 
we  discuss  the  third  class  of  problems.  The  engineer  may  ask  "Can't  we  deal 
with  the  more  pedestrian,  formal  problem  in  a  routine  way?"  Our  answer  is 
"yes";  the  work  of  Rabinow,  Fitzmaurice,  Eden,  Farrington  Electronics,  etc. 
in  pattern  recognition;  Sperry-Rand,  IBM,  etc.  in  computers,  etc.,  show  that 
this  is  true.  However,  the  limits  are  not  reached  until  the  human  repertoire 
of  new  'scientific  games'  is  exhausted.  This  we  have  not  done.  This  is  the 
problem  of  building  a  'thinking  machine,'  a  machine  that  includes  memory,  com¬ 
putation,  self-awareness,  induction,  etc.  We  believe  that  (62)  provides  us 
with  clues  on  how  to  do  this  and  demonstrates  a  fuller  nature  of  'meaning.') 

Semantic  pragmatic  information  is  generally  processed,  l.e.,  offered  or 
sought,  by  'successive  selection'  in  hierarchical  or  taxonomic  schemes,  such 
as  classes,  orders,  families,  etc.,  or  dichotomies.  (Note  this  persists  in  a 
western  Aristotelian  static  two  valued  logical  system  of  identification.) 
However,  J.  S.  Mill  pointed  out  that  induction  and  not  deduction  is  the  only 
road  to  new  knowledge  (and  the  Gestalt 1st*  showed  the  frag&imtary  discrete 
nature  of  induction  -  it  Is  these  'facts'  that  must  be  encompassed  in  a  theory 
of  human  knowledge  and  discovery). 

At  this  point  the  work  of  Carnap  and  his  colleague,  Bar-Hlllel,  must  be 
Introduced.  We  can  propose  as  sources  (65),  (66),  or  (52).  It  is  a  use  of 
Carnap's  theory  of  inductive  probability.  Their  theory,  relating  to  language 
systems,  is  concorned  with  the  semantic-information  content  of  simple 
propositions. 


Inductive  probability  Is  concerned  with  the  odds  on  hypotheses  based  on 
evidence.  This  process  goes  on  in  signal  communication  becween  people  (I 
wonder'what  he  really  meant?)  as  well  as  in  the  scientist's  mind.  In  his  1950 
book,  Carnap  attempts  to  sharpen  this  tool.  He  makes  use  of  Bayes*  theorem 
for  the  calculation  of  a  posteriori  probability.  It  is  generally  only  the 
first  step  of  assigning  equal  a  priori  probabilities  before  the  evidence  that 
disturbs  people.  (However  this  is  quite  good  in  science  since, contrary  to  pop- 
u>ar  judgment,  in  difficult  problems  one  might  just  as  well  assign  all  possible 
hypotheses  in  the  universe  equal  probabilities  -  the  point  we  made  in  (1).) 

The  semantic-information  content  of  simple  statements  are  at  issue  in 
their  theory,  not  the  pragmatic  value  to  any  particular  user,  i.e.,  only  in 
semantic  information  and  not  really  communication.  "Care  must  be  taken  to 
guard  against  temptation  to  use  this  theory,  and  the  Information  measure  it 
sets  up,  in  relation  to  experimental  psychological  work,"  Cherry  warns,  for 
example. 

Language  systems,  as  idealized  into  an  artificial  language  with  clearly 
defined  systems  and  values  of  somewhat  simple  nature,  provide  quantized 
states  (statements)  that  can  be  located  in  an  attribute  space  of  cells  to 
form  a  structure  -  description  of  a  semantic  system  (such  as  characterization 
of  library  books),  in  which  the  individual  propositions  form  a  state- 
distribution  within  cells  (66).  This  is  all  analogous  to  the  setting  up  of 
statistical  mechanics  for  a  system  of  particles.  Bar-Hillel  and  Carnap  then 
develop  theorems  which  conceptually  parallel  Shannon's  theory,  including  such 
concepts  as  semantic  noise.  It  is  suggested  that  the  statistical  theory  of 
communication  can  be  included  in  the  semantic  theory,  but  not  conversely, 
even  though  the  semantic  theory  is  restricted  to  simple  sentences.  The  read¬ 
er  is  referred  to  (52),  1953. 

In  particular  it  is  valuable  to  note  MacKay's  leading  question  and  Bar- 
Hillel 's  answer  in  (52),  1951'.  On  one  han ’ ,  MacKey  wishes  to  stake  his  own 
claim  for  a  'metron'  content,  or  metrical  -  Information  -  content,  as  promul¬ 
gated  in  1948,  and  presented  in  (52) »  1950  (the  number  of  units  of  evidence 
contained  in  a  'representation'  or  description  of  phenomena).  On  the  other 
hand,  he  tries  to  get  Bar-Hillel's  concurrence,  that  Shannon's  theory  is  to 
be  regarded  as  a  statistical  theory  of  communication  (of  signs)  rather  than 
ambiguous  'thoery  of  Information.'  Further,  MacKay  points  out  that  the  Euro¬ 
pean  (English?)  quantitative  view  of  information  was  Introduced  in  connection 
with  the  design  of  experiments,  Bar-Hillel  confirms  the  concept  that  much  of 
the  confusion  arose  from  a  lamentable  l«ack  of  familiarity  in  America  with 
Fisher's  work  -  which  can  easily  help  to  mislead  linguists  end  psychologists 
in  the  theoretical  considerations.  Such  effort*  are  not  to  be  viewed  as 
Shannon's  fault. 

MacKay  argues  his  own  views  of  meaning  In  (52),  1956.  (By  this  time, 
the  content  of  the  'information  theory'  subject  included  Gabor's  logon  content, 
Shannon's  statistical  theory  of  conmunicating  eigne,  MacKey's  matron  content 
and  Bar-Hillel,  Carnap  -  (B-C)  -  semantic  theory  of  a  linguistic  system.) 

First  he  proposes  to  take  over  the  B-C  aemantic  measure  of  Information  within 
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the  scope  of  his  1950  metron  content  concept  of  information-content,  In 
particular,  the  path  of  meaning  of  communication  as  contained  In  Its  effect 
on  the  'conditional-probability  matrix'  of  the  individual.  'Meaning'  of  a 
received  message  is  defined  as  "the  selective  function  of  the  message  on  the 
ensemble  of  possible  states  of  the  C.P.M."  He  ends  with  "Unfortunately  the 
completion  of  a  truly  basic  language  on  these  lines  waits  on  our  understand¬ 
ing  of  the  human  C.P.M."  (Here  MacKay  leaves  semantics  and  comes  to  grips 
with  a  central  issue  In  pragmatics.  The  reader  may  have  caught  a  glimpse  of 
sympathy  with  MacKay  In  our  earlier  comments  In  (1),  when  we  were  not  so 
fully  aware  of  positions  as  now.  The  issue  further  clarifies  In  our  current 
NASA  work  (72),  in  particular  CR-129,  and  our  December  1965  report  (64)  In 
which  we  define  for  the  first  time  what  makes  up  the  content  of  the  human's 
performance  or  state  matrix,  and  thus  lend  substance  to  MacKey's  speculations. 
The  paths  are  even  closer,  though  we  have  not  met,  In  that  both  MacKay  and 
we  are  empathetlcally  involved  with  Warren  McCulloch.  We  suspect,  for  the 
record,  that  McCulloch  Is  In  a  sub  rose  search  to  highlight  the  work  of  all 
of  those  people  who  can  contribute  to  the  working  of  the  brain! 

In  fact,  it  is  the  content  of  current  work  we  have  recently  started  to 
undertake  a  demonstration  of  the  state  of  what  we  call  the  physio log lcal- 
psychological  oscillator  system  in  the  human,  or  what  MacKay  refers  to  as  the 
C.P.M.  To  add  confusion  to  the  dates,  and  indicate  our  independence,  the 
identification  of  oscillator  states  in  the  human  began  in  our  pressure  suit 
evaluation  work  in  about  1946-1948,  received  confirmation  in  our  1956  cloth¬ 
ing-heat  regulation  studies,  and  bloomed  into  a  full  biological  theory  In  our 
1963-1965  NASA  studies.  The  frame  of  reference  was  not  Wiener 's  or  Shannon's 
communications  theories  but  our  own  1947-1952  theories  of  the  non-linear  re¬ 
sponse  of  ph>4lcal  systems.  In  this  we  were  inspired  by  the  work  of  Mlnorsky, 
first  made  available  to  us  during  the  war,  and  later  formalised  in  his  DTMB 
report,  INTRODUCTION  TO  NON-LINEAR  MECHANICS.  Work  in  non-linear  fluid  me¬ 
chanics  was  facilitated  by  being  led  back  to  Poincare  and  the  Russians  through 
Den  Hartog,  Routh,  and  Mlnorsky.  It  is  true  that  young  electrical  engineers 
and  control  engineers  were  discovering  similar  material  through  Nyqulst,  but 
the  young  mechanically  inclined  must  be  forgiven  for  having  tracked  the  path 
through  mechanics  •  including  estronomy,  and  not  olectrlcel  networks  but  through 
the  theories  of  vibrations.) 

Thus,  it  is  not  true,  as  stated  by  Cherry  (51),  that  no  theory  of  prag¬ 
matic  information  has  been  published  corresponding  to  extensions  of  existing 
theorleu.  MacKey's  is  a  perfectly  valid  descriptive  one,  and  our  December 
1965  report  (64)  -  although  It  is  later  •  is  the  foundation  for  its  realisa¬ 
tion.  The  aathematisation  can  come  after  the  experimental  data  are  more  fully 
developed. 

Cherry  continues  his  discussion  in  the  line  of  the  Cartesian  dualism  of 
the  external  or  real  world  and  the  Internal  or  mental  world.  This  creates 
the  mind-body  schism.  There  are  those,  for  example,  who  consider  subjective 
matters  as  scientifically  Indecent,  an  excessive  seal  for  (an  Impossible) 
detachment.  Cherry  proposes  to  see  two  kinds  of  observers  •  one  an  observer, 
in  the  Bridgman  sense, t  Involved  In  the  sieasurenent,  and  the  other  who  can  ob¬ 
serve  and  report,  but  can  make  no  observations  upon  though ts  other  than  his 
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own.  The  work  of  Good  is  brought  in  (67)  (or  see  his  chapter  on  the  mind- 
body  problem  in  Scher  THEORIES  OF  THE  MIND).  There  is  also  some  latar 
discussion  of  MacKay's  work  in  (5?)),  1961. 

(The  issues  are  joined  in  the  pragmatics  of  information  -  not  in  its 
semantic,  or  syntactical  problems,  or  the  statistical  nature  of  language  mes¬ 
sages  -  around  what  relates  message  and  user  in  effects.  The  issue,  well  dis¬ 
cussed  by  Gunther  in  January  1966,  is  that  the  flux  of  events  in  time  -  may  we 
substitute  the  connotation  of  information?  -  has  proceeded  with  two  different 
views,  an  emenative  and  an  evolutionary.  The  emanative,  in  which  all  unfolds 
from  a  unity,  is  reversible,  deterministic,  describable  by  a  two-valued  logic. 
The  evolutionary  (even  if  things  started  from  a  unity,  they  can  change)  is 
irreversible,  granular,  indeterministic.  It  is  Illustrated  in  the  mind-body 
problem;  it  requires  a  meta-language  outside  for  non- two -valued  logics.  This 
is  of  concern  to  a  logician,  because  he  cannot  currently  build  a  computer  of 
adequate  function,  except  by  two-valued  logics;  he  cannot  deal  with  the  prob¬ 
lem  of  self-awareness,  and  self -adapt ion.  Yet  the  human  can.  Therefore,  the 
human  is  net  a  'computer*  based  on  the  two-valued  Boolean  algebra. 

This  is  the  theme  which  was  stressed  in  our  unpublished  1957  "Philosophy 
for  Mid-Twentieth  Century  Man."  It  is  one  of  the  four  problems  undertaken  in 
our  NASA  biophysics  studies.  Zt  is  the  problem  for  which  we  have  proposed 
provisional  answers  in  our  December  1965  report. 

However  ' *  is  very  pleasing  to  find  that  our  work  is  funnelling  down 
the  course  t  .«  been  developing  in  this  century. 

j.  Russell's  formalisation  of  the  laws  of  tiro-valued  logics,  and 
Carn*p;»  -conceptualisation  of  the  semantic  problem,  to  Bridgman's  concept  of 
operational  significance,  and  the  shaking  concepts  of  Godel,  foundations  were 
laid  for  the  works  of  Turing  and  Post,  and  the  applications  of  mathematics, 
both  in  the  fo~m  of  analysis  and  statistics,  under  the  development  by  Fisher 
to  translete  the  problem  of  'information'  to  a  scientific-engineering  base 
from  a  philosophic  base.  We  proposed  the  line  Gabor,  Wiener-Kolmogoroff , 
Shannon,  McCulloch,  Bar-Hi lie 1 -Carnap,  MacKey,  and  now  our  work. 

In  our  view,  the  human  is  represented  by  a  repertoire  of  snalogues  that 
are  Internal  oscillator  patterns,  possessing  both  trenslent  and  steady-state 
character,  that  are  evoked  by  the  message  content  of  the  external  milieu  that 
impacts  on  the  system.  It  is  this  rspertolre  of  'melodies,'  plus  his  guidance 
computer,  that  represents  the  human.  This  is  to  bs  regarded  as  the  mechanistic 
embodiment  of  what  MacKey  wanted  to  be  a  'conditional  probability  matrix.' 
'Meaning'  Is  to  be  contained  in  how  It  affects  the  patterned  repertoire.  How¬ 
ever,  working  out  the  physics  and  mathematics  of  this  system  will  taka  some 
future  doing.  It  is  pertinent  to  follow  the  thenetlc  thread  in  which,  from 
the  Maxwell -Bolt imann  derivation  on,  a  path  of  statistical  'machanics'  waa 
used.  In  many  aystsms  It  la  nut  really  a  statistical  'mechanics'  baesuss  that 
makas  usa  of  Newtonian  mechanic*  for  tha  explicit  levs  of  'atomistic*  change. 
With  no  such  laws,  one  can  only  regard  the  problems  as  'statistical  kinematics' 
end  worry  ebout  the  form  that  exchange  'forcas*  tak<*.  What  results  la  a  dis¬ 
tribution  In  phase  epece  end  entropy- like  end  thermodynamic- like  properties. 


Maxwell -Boltzmann,  Gibbs,  Einstein,  Nyquist,  Shannon  in  communication, 
Brillouln  (for  example,  we  used  to  play  with  such  concepts  during  the  war  in 
setting  up  the  'thermodynamics'  of  traffic,  so  that  the  way  of  thinking 
should  not  be  regarded  as  too  marvelous  or  strange),  Korner  in  biology  of 
interacting  species,  Bar-Hlllel-Camap  in  semantics  are  all  examples.  In 
fact,  it  is  a  point  that  we  stressed  in  (1),  p.  85-91.  The  essence  is  that 
an  equilibrium  state  of  system  states,  and  of  canonical  ensembles  of  such 
systems,  arises  with  equations  of  change. 

To  apply  this  to  'meaning'  in  the  sign  sense  or  the  semantic  sense  is 
not  complete;  it  is  'kinematics.'  The  'dynamic'  analysis  must  be  done  at 
the  level  of  'pragmatics'  that  takes  meaning  in  the  brain  into  account.  The 
use  of  Incomplete  sets  is  the  same  argument  we  faced  in  the  solution  of  the 
equations  of  hydrodynamics,  described  in  two  ONR  reports.) 


5.  PSYCHOLOGY 


Information  theory  and  some  aspects  of  psychology  are  Illustrated  in 
Quaatler  (68). 


6.  COMPUTERS 


Although  the  theory  and  technology  of  computers  do  Intersect  with  the 
field  of  Information  sciences,  and  within  this  second  class  of  problem  in 
particular,  the  computer  field  -  just  as  communications  engineering  -  sped  so 
far  from  the  field  of  Intersection  that  it  mur  be  separately  considered.  The 
literature  of  the  Eastern  and  Wes tern  Computer  Conferences  can  be  used  profit¬ 
ably  for  that  purpose. 


7.  INFORMATION  STORAGE  AK3  RETRIEVAL  -  THE  LIBRARY  PROBLEM 


The  growth  of  interest  In  this  problem  can  be  traced  In  (52),  1956;  (54), 
March  1958;  (55),  November  1956;  (52),  1961;  (69),  and  (70).  In  (52)  1956, 
Falrthorne  and  Mocers  ere  alone.  However,  by  comparison  with  (54),  one  quickly 
finds  that  Mooers  was  tackling  the  problem  of  Information  retrieval  as  teeporal 
signalling,  hit  concept  of  Zatocoding  for  the  mechanised  organisation  of  knowl- 
adga  (the  uat  of  semantic  and  syntactic  descriptors  that  describe  documenr 
content),  and  ferthen  from  19s0  on;  than  Luho,  at  IBM,  was  tackling  the  problem 


of  automation  and  information  since  1952,  that  Fairthorne  was  concerned  with 
document  retrieval  and  other  routines  since  at  least  1955,  Dodd,  1955,  etc. 

Thus  mechanizing  the  search  for  documentation  and  content  has  come  into  prom¬ 
inence  by  the  1950's.  The  publication,  American  Documentation,  is  a  useful 
source. 

The  problems  of  interest,  with  economic  impact,  were  chemical  abstracts, 
the  patent  office,  USAF  data  handling  systems  for  intelligence  -  to  mention 
some  of  the  more  obvious  ones.  The  Taube-Wooster  symposium  (54)  summarizes 
some  of  the  classification  routines  and  devices  that  were  available  or  con¬ 
ceited  openly  at  the  time.  The  attendees  are  indicative  of  the  range  of  in¬ 
terests.  (It  is  hardly  fair  to  consider  that  any  significant  body  of  theory 
or  science  was  being  described,  only  a  community  of  interest.) 

The  later  conference  that  year  (55)  cast  a  much  wider  net.  There  is  a 
much  more  articulate  discussion  of  user's  needs  in  Volume  1,  and  some  of  the 
things  that  had  already  been  done  in  documentation.  In  Volume  2,  Areas  5  and 
6,  study  is  proposed  on  the  organization  of  information  for  storage  and  search, 
system  design  and  theory.  Subjects  cf  some  significance  that  are  discussed 
are  semantic  content  (Vickery,  Meredith),  some  crude  topology  (Gardln),  ex¬ 
perimental  hierarchical  coding  (Koelewljn,  Llebowltz,  Killer,  Claridge).  In 
panel  discussion,  the  opinion  was  expressed  that  not  much  progress  would  be 
made  until  a  rigorous  mathematical  model  of  storage  and  retrieval  systems 
existed,  though  this  seems  to  be  far  from  the  true  need. 

(After  reviewing  Section  5,  we  could  suspect  that  what  was  basically 
needed  was  engineering  attack  with  such  equipment  then  at  hand  -  cards,  punch 
cards,  film,  etc.,  all  with  simple  mechanization,  to  see  what  sort  of  ingenuity 
and  success  would  be  achieved  in  mechanization.  The  wealthier  could  use  more 
expensive  'tools'  such  as  computers.  The  measure  of  this  may  be  taken  by  a 
view  of  Area  6.) 

In  Area  6,  one  gets  the  impression  that  Vickery  and  Fairthorne  were 
laying  the  basis  for  computer  programs  for  document  retrieval.  (An  informa¬ 
tion  retrieval  system  is  defined  by  Vickery  as  any  device  which  aids  access 
co  documents  specified  by  subject,  and  those  associated  operations.) 

The  papers  in  this  Area  6  did  not  change  our  opini  n.  The  subject  seems 
still  open  for  economic  exploitation  by  the  cleverest  or  the  largest,  e.g.,  by 
small  cheap  effort  such  as  the  Peek-a-boo  system  might  be  considered,  or  large 
scale  computer  effort.  The  conclusions  here  would  be  similar  to  pattern 
recognition.  Depending  on  what  you  want  to  pay,  you  can  get  a  certain  magni¬ 
tude  of  results,  the  answers  to  be  shaken  down  by  experimental  trial.  Theory  • 
if  any  -  Is  to  come  after  there  Is  enough  development  to  note  what  boundaries 
have  to  be  cracked.) 

In  the  discussion  (by  quite  a  distinguished  panel),  the  evolution  of  a 
complex  network  was  used  as  analogue.  It  proceeds  in  seeps  with  multiple 
loops.  "Mechanisation  and  automation  of  such  systems  has  not  necessarily  re¬ 
duced  the  complexity  of  functions!  separations  ..."  (The  author  made  eh t  same 


point  in  a  discussion  on  the  automatic  factory  a  few  years  earlier  at  a  Gordon 
conference ,  that  system  optimalizatiou  does  not  mean  automating  every  link, 
or  minimizing  the  number  of  loops,  only  determining  what  optimalizes  perform¬ 
ance  criteria.  These  we  feel  our  way  to  by  quantum  jumps.)  The  chairman, 

Dr.  Tukey,  proposed  the  steps  of  providing  a  theory  that  could  encompass 
existing  and  reasonably  feasible  systems,  functional  hardware  should  be  con¬ 
ceived  and  evaluated,  and  then  experimental  trial  by  ’classical  retrieval' 
attempted.  (We  echo  the  same  thought.)  Minsky  emphasized  the  capability 
of  the  modern  computer,  in  particular  in  heuristic  programming,  i.e.,  what  to 
try  first,  and  how  to  use  results  to  modify  action.  Mandelbrot  urged  study 
oi  taxonomic  trees. 

One  may  close  with  the  librarian's  comment  (Mr.  Clevedon).  They  were 
trying  to  find  a  statement  of  what  librarians  have  been  doing.  This  has 
heated  up  librarians  a  little.  However,  now  that  some  library  operations  can 
be  mechanized,  people  must  understand  why  librarians  do  many  things.  Thus, 
experiments  are  needed.  (We  concur  heartily.  We  have  many  times  urged  in 
similar  contexts,  observe  the  'engineer,'  or  'practitioner,'  or  'clinician.' 

If  you  'wire*  together  a  number  of  skilled  practitioners  to  perform  a  task 
they  have  some  competence  in,  then  you  are  watching  a  very  skilled  'computer' 
or  'information  machine'  at  work.  It  has  an  extensive  'memory'  which  can 
always  be  tapped.  This  explains  to  us  our  personal  creed  -  we  can't  help  the 
expert  in  building  a  foundation  or  advancing  his  field  until  he  is  stuck. 

Then  -  by  continued  ob>;  vation  and  query  -  we  can  determine  a  foundation  or 
generalization,  and  where  science  can  help.  This  is  very  much  the  descrip¬ 
tion  of  an  optimal  human  information  process,  as  follows: 

The  known  experimental  surmises  -  1,  2,  3,  etc.  -  have  the  best  a  priori 
equal  probabilities  of  working,  by  Bayes  a  priori  theorem.  Put  in  any  other 
wild  ones  that  cover  your  view  of  the  universe.  From  these  estimate  by 
Gestalt,  by  induction,  the  line  to  infinity.  Then  yo”  have  a  hypothesis  with 
Baysian  probabilities  that  can  be  used  to  reican,  over  and  over,  until  a  high 
probability  emerges.  This  is  the  area  of  practice,  or  theory.  Fix  on  this, 
until  it  proves  wrong;  rescan,  etc.) 

It  seemed  clear  that  the  field  would  then  be  taken  over  by  the  large 
scale  computer  after  1958,  and  (70)  in  fact  suggests  that  this  is  what  hap¬ 
pened.  That  reference  is  useful  as  a  philosophic  guide  ’■o  what  linguistic 
questions  are  associated  with  the  field  today,  and  to  more  recent  literature 
such  as  PROCEEDINGS  OF  A  SYMPOSIUM  ON  MECHANIZATION  OF  THOUGHT  PROCESSES, 

1959;  CURRENT  R  AND  D  IN  SCIENTIFIC  DOCUMENTATION,  NSF  Semiannual;  IBM  INFORM¬ 
ATION  RETRIEVAL  SYSTEMS  CONFERENCE,  1960;  THIRD  INSTITUTE  ON  INFORMATION 
STORAGE  AND  RETRIEVAL,  American  U.,  1961;  Mooers,  "The  Next  Twenty  Years  in 
IR:  Some  Goals  in  Predictions,"  1959;  Vickery  ON  RETRIEVAL  SYSTEM  THEORY, 

1961. 


Two  interesting  articles  are  by  Melkonoff  and  Maron  (70).  Melkonoff 
describes  languages,  up  to  third  level,  for  compiling  and  between  computers, 
and  the  need  for  orientation  toward  logical  data-processing  problems  rather 
then  arithmetic  (i.e.,  the  problem  vuh  pragmatics  Is  Joined.) 

Msron's  papers  ere  probably  at  sophisticated  as  the  logician  can  bring 
to  baar  today  on  language  data-processing. 


8.  MACHINE  TRANSLATION 


The  literature  is  essentially  the  same  as  for  the  previous  subjects. 

One  may  add  a  reference  like  (56)  for  specialized  content.  Early  names  are 
Yngve,  Chomsky,  Bar-Hillel,  Dosert,  Edmundson,  Osvald,  Oectinger.  The  ma¬ 
chine  translation  of  Russian  has  furnished  much  of  the  impetus.  The  papers 
of  Masterman  et  al  and  Oettlnger  et  al  in  (55)  are  good  starting  content. 

(It  is  likely  that  the  machine  translation  problems  became  a  subject  of  large- 
scale  computer  investigation  earlier  than  the  storage  and  retrieval  problem. 
However  the  conclusions  to  be  drawn  are  the  same.  The  fact  is  we  proposed  a 
joint  experimental  machine  translation  program  with  Consultants'  Bureau  in 
about  1959.  It  contained  the  same  conclusions  we  perceive  much  more  clearly 
now.  Humans  ate  the  best  information  machines  from  which  to  discover  human 
information  methods,  i.e.,  from  which  to  discover  pragmatics.) 
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CLASS  3  PROBLEM  -  INFORMATION  SCIENCE  OF  THE  BRAIN 


Cybernetics,  or  some  sort  of  theory  of  guiding  machines  (or,  as  the 
Russian**  insist,  'information'  machines)  begins  formally  with  Wiener  (71). 

Its  significance  in  the  organization  of  the  biological  system  is  discussed 
in  (72).  However  an  early  example  of  its  intrusion  into  the  information 
field  is  the  MacKay-McCulloch  paper  (73),  or  the  series  of  papers  in  (52), 

1956,  by  Gregory,  Allanson,  Taylor,  Wall  et  al.  The  stage  was  thus  set  for 
the  development  of  a  line  of  problems  appropriate  for  an  information  theory 
or  a  theory  of  guiding  mechanisms  and  methods,  i.e.,  of  form  and  function  in 
the  brain.  We  intend  to  touch  on  some  of  these. 

(It  is  not  our  implication  that  the  MacKay-McCulloch  paper  was  the  first 
one  dealing  with  the  information  content  of  biological  systems.  This  had  been 
explored  previously  in  the  senses.  Beyond  this,  Gregory  validly  points  out 
that  Adrian  in  the  20 's  was  responsible  for  developing  a  communications  view 
of  neural  information  and  coding  in  the  nervous  system  (74).  However  the 
joining  of  protagonists  -  the  interest  in  an  information  view  of  the  informa¬ 
tion  in  the  brain  and  the  neurophysiological  information  of  the  brain  -  in¬ 
volved  the  fullest  cooperation  of  communications  scientists  and  neurological 
scientists.  Wiener-Rosenbleuth-McCulloch-von  Neumann  illustrates  this; 
MacKay-McCulloch  illustrates  it  again.  Adrian-Van  der  Pol  could  easily  have 
illustrated  this  20  years  earlier,  for  they  did  know  each  other.  No  physi¬ 
cist  can  avoid  paying  his  respects  to  Helmholtz.  However  at  the  moment  we 
are  concerned  with  the  modern  marriages  that  have  arisen  from  the  birth  of 
'cybernetics . ' 

To  lay  a  background  for  further  discussion  we  must  clarify  our  views  on 
a  central  concept  of  'feedback.'  Biological  scientists  are  surprised  when  we 
question  the  concept.  The  purpose  is  not  to  destroy  the  idea  but  to  put  it 
in  perspective.  This  report  has  enriched  our  ideas.  Earlier  discussion  is 
contained  in  (72),  notably  the  1st  and  3rd  reports,  and  (64),  the  5th  report. 

We  propose  to  discuss  the  control  concept  of  feedback. 

We  do  not  believe  that  Wiener  would  have  dismissed  our  ideas,  and  might, 
in  fact,  have  considered  them  identical  to  his  own,  however  we  have  not  been 
able  to  get  them  formally  out  of  his  work. 

Imagine  that  there  exists  a  complex  network  that,  in  fact,  is  capable 
of  performing  its  function.  Suppose  you  want  to  improve  its  control  character’ 
is  tics.  We  visualize  that  it  may  be  well  regulated  in  a  variety  of  ways. 

You  can  take  a  chain  out  from  any  closed  loop  *my  point  by  oper'ng 
it,  so  that  the  loop  cdntains  a  measure  of  what  is  g(  ,  on.  Typically  ehls 
may  be  a  measure  of  flux  or  potential,  and  the  point  may  be  at  the  load  or 
wherever  the  serious  business  is  going  on. 
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In  the  first  use  of  'feedback*  a  signal  was  fed  back,  by  coupling  with 
an  appropriate  sign,  to  another  portion  of  the  network,  typically  near  the 
'input,'  or  an  upstream  branch  or  loop.  To  many,  this  was  viewed  as  the  be¬ 
ginning  of  an  'information'  link.  However  for  purely  linear  networks,  we 
would  object  to  the  view  that  -ais  was  really  information  flow,  in  that  all 
of  the  system  response  is  feally  'determinate,'  given  the  course  of  input. 

From  a  linear  point  of  view,  the  network  possessed  an  anomalous  signal  - 
'noise'  -  coupled  to  the  network  in  some  non-interacting  way.  The  noise 
could  act  on  the  network,  but  it  was  not  clear  how  the  network  could  act  on 
the  noise.  The  'purpose'  of  feedback  was  to  take  advantage  of  some  symmetric 
properties,  expressed  as  phasing  characteristics,  by  which  certain  'compensa¬ 
tion'  properties  could  be  achieved.  This  was  quite  an  achievement  concep¬ 
tually,  because  casual  opinions  would  have  been  that  noise  must  be  cumulative 
faster  than  signal,  yet  here  a  realizable  scheme  was  demonstrated  that  showed 
that  signal  could  be  saved  in  the  face  of  noise.  However  the  basic  problem 
inherent  is  that  the  network  already  shows  the  evolutive  non-deterministic, 
granular,  quantized  enfolding  of  its  response,  in  time,  that  Gunther  refers 
to,  and  begins  to  illustrate  the  mind-body  problem  of  interaction  at  the  low¬ 
est  possible  level. 

The  essence  of  the  matter  is  that  the  network  -  as  'proved'  by  its 
sustained  'noise'  —  is  not  really  totally  a  linear  problem,  even  though 
Nyquist  showed  how  one  might  retain  much  of  a  nearly  linear  description.  The 
significance  of  this  will  gradually  unfold.  n 

The  problem  of  feedback  -  in  the  automatic  control  sense  -  went  one 
step  further  than  branching  out  a  sensing  loop.  The  'state'  of  the  output 
was  branched  out  and  put  into  comparable  measure  with  the  input  to  determine 
an  'error'  difference,  generally  of  a  non-interacting  form,  which  would  then 
be  power  amplified  into  an  interacting  form  so  as  to  take  some  sort  of  cor¬ 
rective  action  to  minimize  the  error  in  accordance  with  some  time  dependent 
differential  operator.  Wiener  made  contributions  to  the  specific  optimaliz¬ 
ing  question.  This  is  not  the  same  problem  as  the  former,  which  was  a  prob¬ 
lem  of  'compensation'  in  a  given  network  that  dealt  with  an  unknown  that  could 
not  be  carried  within  the  theory;  namely,  'noise,'  by  taking  advantage  of 
some  phasing  characteristics.  It  is  interacting.  The  second  does  not  even 
have  to  have  a  complete  network.  A  two  terminal,  open-looped  power  element 
can  be  controlled,  i.e.,  have  input  and  output  put  into  concordance,  by  a 
fed-back  branch  that  closes  one  loop.  However  this  branch  does  not  have  to 
be  interacting.  One  might  describe  it  by  saying  that  one  has  tried  w  'sneak' 
some  information  measure  from  the  output,  and  tried  to  reintroduce  a  'compen¬ 
sation'  in  a  form  somewhat  like  noise  to  control  the  action,  i.e.,  coherent 
'noise'  used  to  control  undesired  'noise.'  Insofar  as  the  input  character  is 
not  expected  or  predictable,  then  the  feedback  loop  deals  with  the  'informa¬ 
tion'  that  mirrors  this  'noise'  for  the  corrective  action.  In  such  a  sense, 
a  feedback  controller  is  an  'information'  machine  and  is  likeN  thua  understood 
by  all  those  expert  In  automatic  control  One  must  again  give  credit  to  Wiener 
for  his  exposition  of  optimal  design  criteria  when  the  input,  though  not  pre¬ 
dictable,  is  stationary  or  drawn  from  an  ergodic  universe  of  signals.  It  la 
this  link  that  bands  his  etfort  to  Shannon's  as  a  very  important  precursor. 
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However  such  noise  'information'  is  syntactic.  It  deals  with  the  for¬ 
mal  abstract  character  of  signals,  and  in  fact  sees  little  difference  between 
coherent  noise  and  incoherent  noise,  i.e.,  to  the  anti-communist  it  replies, 

"I  don't  care  what  kind  of  communist  you  are!"  It  is  open- looped  in  the 
sense  of  'purpose'  of  the  network,  disembodied  minds  and  bodies  without  minds 
and  universes  with  or  without  mind  or  body  can  exist.  It  is,  at  best,  kine¬ 
matic,  i.e.,  symbolic,  in  space  and  time. 

It  is  to  the  credit  of  the  philosophers  with  linguistic  background  that 
they  were  able  to  bring  in  the  concept  of  syntactic,  semantic,  and  pragmatic. 
We  have  been  pragmatic  and  seeking  'pragmatic'  description  for  a  long  time. 

It  is  only  now  that  some  foc.us  emerges.  Again  we  can  allude  to  our  hydro¬ 
dynamics  work  and  the  concept  stressed  in  (1)  that  was  used  by  Shannon,  and 
by  Nyquist,  by  Boltzmann,  by  Brillouin,  etc.,  the  statistical  mechanical  con¬ 
sequences  of  there  being  many  active  'atomistic'  elements  in  an  ensemble.  In 
hydrodynamics  the  atoms  are  atoms,  in  chemistry  molecules,  in  solids  crystal¬ 
lite  domains,  in  cells  the  protein  agregates,  in  biological  systems  the  cells, 
in  society  the  human.  Most  authors  have  chosen  the  descriptive  and  'mystical' 
path  of  entropy,  and  order,  etc.  It  is  much  simpler  to  consider  statistics, 
and  simple  physics,  and  geometry  and  Bayes. 

We  do  not  propose,  at  this  time,  any  fanciful  description  of  'semantic.' 
We  are  satisfied  to  distinguish  minimally  two  elements  -  the  formal,  ideal¬ 
istic  elements,  and  the  real  system  element. 

On  one  hand,  philosophically  we  must  regard  every  component  -  of  sys¬ 
tems  -  as  nearly  coexistensive  conceptually  with  the  universe.  Every  element 
implies  its  negation.  The  stone  implies  the  non-stone,  thus  the  entire  uni¬ 
verse  outside  of  the  stone.  This  is  not  metaphysical  nonsense;  we  can  refer 
to  the  communications  books  on  the  existence  and  description  of  monochromatic 
wave  trains  to  recognize  the  same  conversation.  Thus  the  brain-non-brain, 
universe-non-universe  problems  and  the  entire  two-valued  logic  problems  begin. 

Pragmatically,  the  physicist  finds  that  things  have  a  finite  range  of 
influence.  Philosophically  and  physically  not  really,  for  'Eventually  all 
things  crumble  into  dust.'  To  avoid  this  impasse,  we  finally  get  away  from 
the  'equal  measure1' problem,  of  being-non-being,  etc.  exemplified  by  decays 
like  e”kt  which  take  an  Infinite  time  to  disappear.  As  an  aside,  the  advan¬ 
tage  to  having  been  brought  up  as  a  non-linear  fluid  mechanical  physicist 
rather  than  un  electrical  physicist,  is  that  whereas  the  latter  thinks  of  such 
exponential  processes  as  his  prototypes  for  'all1  time,  we  'know'  that  our 
pressures  decay  by  laws  with  finite  cut-off  times,  or  we  'know*  how  to  make 
resistances  that  have  any  kind  of  cut-off  you  wish,  i.e.,  we  very  quickly  be¬ 
come  'pragmatic.'  This  is  far  from  trivial. 

The  Impasse  is  broken  as  follows.  It  is  feasible  to  seek  apt  non-linear 
'explanations'  for  real  phenomena  for  segments  of  space  and  time  that  are 
bounded  both  above  and  below.  This  concept  has  bean  growing  with  us  since 
1950. 
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From  below,  it  is  bounded  by  the  relaxation  time  and  mean  free  path 
associated  with  the  statistical  mechanical  processes  associated  with  the 
atomistic  elements.  From  above,  it  is  bounded  by  the  time  and  space  over 
which  form  and  function  can  be  separated.  At  the  present  we  are  not  prepared 
to  be  more  precise  on  this  point.  Pragmatically,  we  feel  our  way  to  where 
and  when  the  walls  crumble.  The  physicist  can  only  proceed  by  embedding  his 
problems  in  a  suitable  bounded  and  boundary  valued  problem,  good  only  within 
a  definite  space  and  time.  Given  a  universe  that  exists,  in  which  one  em¬ 
beds  such  and  such  systems  and  I,  then  certain  interacting  and  nearl;'  nonin¬ 
teracting  relatiuiu.nips  hold,  *  I  *  may  consider  many  of  the  non-interacting 
relations  to  be  'observer'  relations,  however  ' I '  will  find  the  uncertainty 
relations  involving  system  and  'observer,'  and  non-interacting  results  will 
occur.  The  sun  will  act  on  many  systems  with  'no'  interaction,  as  Icarus 
found,  and  'thermal'  noise  will  thereby  be  generated.  All  of  this  I  must  put 
at  the  boundary.  The  electrical  network  analyst  is  careless  in  this.  For 
example,  he  almost  never  has  the  thermodynamic  interaction,  even  though  this 
was  Nyquist's  brilliant  point.  The  paradoxes  of  equal  measure  easily  arise. 

The  problem  is  that  all  block  diagrams  are  not  equivalent,  even  though 
some  formal  mathematical  equivalence  seems  useful.  An  m  in  mx  can  be  erased. 

A  physical  mass  in  a  system  cnnnot  be,  nor  can  it  be  replaced  by  a  negative 
mass  to  equate  it  to  zero.  Thus  the  structural  and  the  formal  properties  are 
not  the  sa..ie.  In  linear  measure,  the  stone  and  the  non-stone  have  equal  meas¬ 
ure,  or  the  10  hp  motor  and  the  meter  reading  observer.  Equality  of  measure 
in  tie  block  diagram  only  becomes  meaningful  when  the  same  power  is  controlled 
by  both  of  the  points  of  intersection. 

Equation  sets  must  be  carefully  drawn  on  the  basis  of  their  interaction 
properties,  as  well  as  their  formalistic  block  diagram  properties.  This 
means  that  pappa's  command  to  stop  is  just  as  real  to  the  computing  real  brain 
as  a  brick  wall  or  a  repression  formed  in  childhood. 

We  thus  take  an  entirely  different  view  of  networks  than  most  other 
scientists.  We  are  concerned  that  the  energetics  control  measure  of  each 
term  in  our  equations  be  well  determined;  that  they  be  isomorphic  over  the 
space  and  time  that  they  are  to  be  used;  that  the  equations  be  complete  for 
the  boundary  conditions;  that  our  time  and  space  scale  be  determinate.  The 
methods  of  statistical  mechanics,  carefully  applied,  for  near-equilibrium 
situations  then  lead  to  conditions  of  equilibrium,  t.e.,  to  equilibrium  dis¬ 
tributions  amonp  the  atomistic  elements,  illustrated  by  the  Maxwel 1 -Boltzmann 
distribution,  or  Johnson-Nyquist  noise,  or  Brownian- Einstein  motion,  etc., 
and  h'j  equations  of  change.  We  discussed  this  briefly  with  references  in  (1). 
Such  isolated  equatlon-of-change  systems  do  not  lead  to  the  linear  network 
equivalent  -  R,  C,  L,  and  voltage  sources  -  of  electrical  network  theory  plus 
Johnson  or  Schottky  or  Brownian  noise  -  the  latter  as  in  the  electromechanical 
galvanometer  -  but  to  such  regimes  as  linearly  stable  motion  of  linear  net¬ 
work  theory  or  laminar  flow,  and  the  non-ltnearly  stable  spectrum  motion  such 
at  In  turbulence,  or  perhaps  of  atomic  and  nuclear  systems.  This  Is  illus¬ 
trated  In  (75).  This  report  and  its  earlier  one  illustrates  the  primitive 
•tete  and  present  difficulties  for  finding  practical  solutiona. 


66 


The  essential  step  is  that  the  equation  sets  for  a  system  must  be  em¬ 
bedded  at  the  highest  level  at  which  the  response  of  all  such  systems  is 
ergodic,  i.e.,  that  form  a  stationary  system  of  system  states,  so  that  any 
one  system  in  any  one  operating  condition  can  be  viewed  as  enfolding  a  phase 
space  path  that  is  very  close  to  all  other  systems.  In  linguistics,  this  is 
the  pragmatic  level  -  that  doesn't  even  depenu  on  words  for  consnunication  - 
not  tht  semantic.  In  systems  science,  one  must  use  equations  such  that  each 
has  equal  hierarchical  measure,  else  the  distributed  phase  space  is  not  prop¬ 
erly  representative.  The  ideas  here  are  still  very  new  and  poorly  defined. 

Nevertheless,  this  is  the  nature  of  the  systems  problem.  Similar  to 
the  procedure  that  was  used  in  hydrodynamics  -  of  the  discovery  of  the  steady 
states  and  dynamics  of  the  hydrodynamic  field  and  then  the  details  of  the 
spectrum  of  turbulence,  or  in  other  'atomic'  spectroscopic  fields,  we  are 
attempting  to  set  up  the  experimental  spectroscopy  of  the  biological  system. 
More  recently  we  have  found  another  Investigator  Goodwin  (76)  whose  Ideas 
are  quite  related. 

In  viewing  the  brain,  with  its  'atomicity'  at  the  cellular,  neuron,  and 
various  specialized  systems  -  not  all  of  whose  characteristics  are  veil  under¬ 
stood  -  it  is  apparent  the  determination  of  mechanisms  is  an  horrendous  task. 
Nevertheless,  the  Job  is  done,  as  are  all  such  analyses,  by  viewing  the  spec¬ 
trum  of  effects  in  space  and  time,  over  Isolated  portions  of  space  and  time. 
The  promise  held  out  in  ou?  1961  Army  study  on  the  life  sciences  is  beginning 
to  flower.  A  definite  sustained  spectrum  of  time  effects  is  beginning  to 
develop.  It  is  with  the  background  of  dynamics  that  has  been  developing  in 
(72)  that  we  will  explore  the  information  theory  of  the  brain.) 

References  (77)  to  (103)  are  some  of  the  Interesting  sources. 

In  (52),  1956,  Allanson  touches  on  the  properties  of  neurons,  as  dis¬ 
cussed  by  Eccles  in  1935,  to  describe  properties  or  random  natural  nets  from 
a  non-linear  stability  view.  Uttley's  work  on  signals  in  the  nervous  sys¬ 
tem  is  considered  and  Lashley's  anatomical  cell  counts  In  the  visual  field  to 
note  whether  neuron  delay  lines  could  be  used.  It  is  evident  neurons  and 
electronic  elements  were  on  people's  minds.  Taylor  shows  attempts  at  analogue 
simulation  of  neural  nets.  Wall  et  al  discusses  experimental  data  directed 
toward  estimating  the  average  frequency  associated  with  information  capacity 
in  neural  channel  pulses.  The  possible  relation  to  earlier  work  by  Barron 
and  Matthews  In  1935  is  brought  up.  Quaetler's  paper  attempts  to  Indicate 
the  channel  capacity  of  various  human  systems  or  'channels.'  He  concludes 
that  he  can  find,  in  accordance  with  Llckllder,  a  limit  of  about  25  bits  per 
second  (McCulloch  suggests  a  higher  Individual  value  of  50),  an  Invariant 
characteristic  of  che  human  In  optimal  conditions  over  periods  of  time.  In 
the  decomposition  of  a  field  "In  a  single  glance"  he  suggests  up  to  5  bits 
for  a  single  kind  of  Information  and  about  20  bite  for  all  kinds.  The  'logon* 
content,  l.e,,  the  dimensionality  or  number  of  degrees  of  freedom,  of  one 
psychological  perception  Is  about  7.  Good  raises  a  pertinent  question  as  to 
the  correlation  between  speed  of  response  and  rate  of  Input  of  information  but 
this  Is  not  answered. 
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(At  this  point  one  can  begin  to  see  the  kinds  of  problems  that  are  go¬ 
ing  to  emerge  and  that  were  already  in  flux  of  discussion.  On  one  hand  there 
is  the  problem  of  transmission  in  neural  nets  that  had  been  covered  much 
earlier  by  McCulloch  and  Pitts  in  1943  and  1945,  following  a  line  then  to 
Ashby's  book  (77)  and  to  von  Neumann  (78)  on  how  brains  might  handle  informa¬ 
tion  by  known  analogies.  On  another  hand,  there  is  the  question  raised  by 
Quastler  of  how  much  information  does  the  brain  handle.  We  would  like  to 
make  a  few  comments  on  the  latter. 

Quastler  treats  some  problems  which  were  known  to  us  earlier  in  metrol¬ 
ogy,  and  some  that  were  not  known  to  us  until  later.  This  is  no  discussion 
of  priorities,  just  of  results  viewed  independently.  First,  it  has  been  our 
'layman'  Impression  of  a  round  number  0.1  second  response  time  for  brain  ac¬ 
tivities.  When  we  encountered  Homer  Smith's  discussion  of  piano  playing  (72) 
first  report  -  we  did  some  independent  work  and  found  about  9  notes  per  sec¬ 
ond  readable  by  moderately  competent  pianists.  Quastler  finds  5-6  keys  per 
second.  The  difference  is  not  important.  However,  in  Quastler 's  terms,  this 
would  be  22  bits  per  second  because  of  the  selection  from  a  certain  number  of 
keys.  We  cannot  view  the  result  this  way.  We  still  see  a  system  fast  enough 
to  govern  a  simple  field  complex  in  about  0.1  second,  i.e.,  that  is  made  up 
of  such  a  number  of  reflex  arcs.  Thus  the  brain  is  capable  of  controlling  10 
'simple'  states  per  second.  In  proposing  such  an  issue  as  brain  dimensional¬ 
ity,  i.e.,  'logon'  content,  we  are  willing  to  accept  that  seven  'factors'  is 
the  maximum  the  brain  can  juggle.  We  aren't  certain  how  to  relate  scale 
position  and  brain  states,  but  from  our  literature,  we  concur  with  the  1951 
Garner-Hake  studies  that  n  scale  can  be  estimated  to  30  parts  with  a  relia¬ 
bility  approaching  one  p  irt  in  10-20.  A  summary  of  about  4  binary  digits, 
i.e.,  16  states,  per  'instant,'  and  thus  supermaxima  of  25-50  bits  per  second 
for  a  given  degree  of  freedom  is  possible.  The  gain  from  many  channels,  such 
as  7  degrees  of  freedom,  represents  20  bits  or  so,  a  problem  of  memory,  in 
which  apparently  the  body  can  only  bring  so  many  systems  into  action.  If  we 
accept  the  25  bits  per  second  this  would  retranslate  to  10  elements  per  sec¬ 
ond  for  a  single  degree  of  freedom,  or  the  same  speed  for  about  3  degrees  of 
freedom,  i.e.,  3  elements  per  second  for  3  different  channel  tasks;  or  using 
memory  up  to  about  7  channel  sources  can  be  viewed. 

This  strikes  us  as  being  within  the  background  of  Good's  question.) 

In  (68),  Stroud's  paper  deals  with  the  brain  in  Its  'kinematic'  content 
of  psychological  time,  pointing  to  its  non-contlnuous  nature,  Its  fragmenta¬ 
tion  in  the  0.05  to  0.2  second  Interval.  He  refers  to  Jaiobson'f  estimate 
of  about  4  x  10”  bits  per  second,  or  4  x  10^  bits  per  moment  as  the  state  in¬ 
formation  carried  by  the  brain,  and  suggests  that  it  is  much  larger  than  100 
bits  which  is  sometimes  given.  It  Is  the  cross-purpose  discussions  of  such 
estimates  as  4  x  10*  bits  per  moment  in  memory,  5  bits  per  moment  in  action 
and  reaction,  100  Impulses  par  moment  given  as  "previous  estimates  of  the 
maximum  informatlon-carry-up  capacity  of  Che  nervous  system"  by  Wall  et  al  in 
(52),  1956,  or  much  smaller  estimates  made  In  their  earlier  work  that  framed 
the  information  capacity  of  the  central  nervous  system  question  at  the  begin¬ 
ning  of  the  field  10*15  years  ago.  Quastler  has  also  touched  on  the  problem 
(b8>. 


(It  was  interesting  to  hear  Stroud  repeat  his  oaper  title  at  the  Jan¬ 
uary  1966  New  York  Academy  meeting.  He  stated  that  there  is  very  little  he 
would  change.  We  can  consider  the  following  'confirmations.1  Schaltenbrand 
on  consciousness ,  made  the  point  that  in  the  eye  the  border  between  flicker 
and  pitch  is  about  0.05  seconds,  l.e.,  one  goes  from  an  event  to  a  modality. 
Ephram,  on  onset  of  perception,  also  makes  the  point  that  there  is  a  process¬ 
ing  period  of  0.06-0.07  seconds  in  which  the  onset  of  a  perception  is  delayed. 

We  have  used  the  concept  of  a  'posture,'  which  really  is  quite  similar 
to  Stroud's  'moment,'  and  to  those  of  the  other  speakers,  with  a  variety  of 
different  details.  The  formation  of  significant  Pimple  'postures'  at  rates 
approaching  10  per  second  is  thus  likely  brain  motor  control.  The  open  issue 
is  the  content  available  to  the  nervous  system. 

Because  of  its  appropriateness,  we  here  suggest  the  hypothesis,  smewhat 
out  of  context,  that  is  forming  in  our  NASA  biophysics  work,  that  all  ot  the 
local  neuromuscular  regions  of  the  body  are  mapped  into  the  brain,  and  that 
possibly  all  of  the  neurohumeral  regions  of  the  body  are  also  mapped.  Our 
basic  reason  for  this  suspicion  is  that  a  near  10  cps  vibration  exists  at  all 
times  in  all  muscle,  and  is  clearly  evident  in  gross  magnitude  when  animals 
come  out  of  anesthesia,  or  in  shivering,  convulsions,  etc.  In  weak  form  or 
otherwise,  the  analogue  mapping  of  form  and  function  alluded  to  in  (64  )  is 
invariably  available  as  a  shadowy  analogue  mapping  of  physical,  or  perhaps 
better  chemical,  mapping  of  the  system.) 

In  (52),  1961,  Grossman's  paper  reviews  the  experimental  evidence  for 
a  constant  information  capacity  in  the  Shannon  sense  in  memory,  such  as  25 
bits  per  perception  (7-8  decimal  digits  digested);  or  che  Miller  concept  of 
7  'chunks,'  or  degrees  of  freedom,  as  a  constant  number  of  items  irrespective 
of  source.  (We  have  favored  the  latter  on  first  thought.)  The  data  examined 
seemed  to  lie  in  between.  "...  recall  was  a  reconstructive  rather  than  a 
passive  repetition  process."  (The  results  seem  ambiguous.)  Goldman-Elsler 
investigates  a  very  interesting  problem  that  illustrates  the  computer  nature 
of  the  brain,  namely  in  abstracting  information  from  a  complex  picture, 
there  is  hesitation  in  reply  before  phrasing  a  dsscription  and  summary, 
which  diminishes  with  repeated  trial;  and  that  pauses  occur  in  the  use  of 
words  with  low  transition  probability.  Thus  the  brain  uses  a  strategy  of 
planning  content  and  structure  verbally,  and  then  selecting  fitting  words. 

Neuron- like  networks  are  discussed  by  Farley  and  Clark.  "Essentially 
nothing  is  known  of  the  functional  organisation  of  the  nervous  tissue  of  the 
central  nervous  system  which  produces  complex  behavior."  The  work  of  Pitts- 
McCulloch,  thalr  own  computer  studies,  and  Rosenblatt's  perception  studies 
(starting  from  1958)  emerge.  Reference  is  made  to  a  I960  book  by  Farley, 
SELF-ORGANIZING  SYSTEMS.  The  network*  they  simulate  on  computer*  seam  to  have 
reaponse*  closer  to  networks  of  cell  bodies  and  axons  rather  than  neuron  nets  - 
namely,  initial  thresholds,  refractory  periods,  and  rough  exponential  decay 
after  firing.  Dendritic  function  la  ignored,  although  wave- like  epread  seems 
representable.  The  results  are  viewed  aa  vary  primitive  examplas  of  informa¬ 
tion  transformation  and  control  capabilities  that  may  have  little  relation  to 
neuro-physiologlcal  models. 


69 


In  discussion, Good  wonders  what  the  sixth  conference  will  demonstrate 
In  models.  (He  validly  calls  attention  to  an  excellent  elementary  beginning 
in  Hebb's  1949  book  (79).  One  should  also  add  (80)  and  (81).  Julesz  pre¬ 
sents  some  Bell  Labs  work  of  Speeth  and  Konentsky. 

A  complex  experimental  model  for  neurophysiological  functions  is  at¬ 
tempted  by  Zemanek  et  al.  Their  inspiration  all  from  about  1950,  was  Ashby's 
Homeostat,  Shannon's  Maze  Runner,  V.  Walter's  Conditioned  Reflex  Model.  They 
show  four  model  efforts  for  conditioned  reflexes.  (The  effective  lack  of 
discussion  suggests  that  no  one  -  at  least  at  that  time  -  was  really  ready  to 
comment  on  the  detailed  merit  of  any  model.) 

A  oaoer  by  Minsky  and  Self ridge  on  learning  '.i  random  nets  basically 
suggests  that  these  may  only  be  useful  for  small  local  jobs  and  not  for  per¬ 
forming  complex  tasks. 

The  paper  by  Papert  on  a  unified  account  of  some  perceptual  learning 
machines  like  those  discussed  by  Uttley  and  by  Rosenblatt  (1958)  is  near  pres 
ent  levels  of  sophistication.  It  is  not  known  whether  these  models  resemble 
the  working  of  a  brain,  but  they  illustrate  how  certain  complex  brain  func¬ 
tions  might  be  carried  out  by  component  populations  not  more  numerous  or  com¬ 
plex  than  the  neurons.  The  theory  of  such  conditional  probability  machines 
is  left  to  those  with  mathematical  Interest.  Typically  one  may  start  from 
(80). 


Kochen  at  IBM  begins  the  discussion  of  combinatorial  problems  which 
have  the  property  of  rapidly  growing  beyond  the  capacity  of  contemporary  com¬ 
puters.  There  is  the  possibility  of  simulating  human  cognitive  behavior, 
such  as  learning  and  Inference,  by  a  'heuristics'  of  strategy.  The  computer 
exercise  Is  stressed,  and  similar  work  is  referenced. 

It  would  seem  clear  that  the  information  theory  of  the  brain  and  behav¬ 
ior  cannot  proceed  without  some  attention  to  the  work  of  social  worker,  psy¬ 
chologist,  and  psychiatrist  on  one  hand,  neuroanatomist,  neurophysiologist  on 
the  other  hand;  and  to  the  cyberneticist.  It  is  not  appropriate  here  to  dis¬ 
cus*  the  problem  with  any  depth.  One  can  view  (64)  and  (72)  as  our  rudimen¬ 
tary  and  speculative  beginnings  to  bring  about  such  a  synthesis.  However, 
there  are  so  many  more  expert  pieces,  that  we  can  only  name  a  few  representa¬ 
tive  sources.  Reference  (82),  Young,  for  example  is  an  excellent  little  book 
discussing  the  brain.  Reference  (83)  is  an  excellent  example  of  a  potential 
nervous  system  decoding.  (A  competent  investigator,  Dr,  Llpets  if  engaged  in 
an  effort  to  demonstrate  structural  mechanisms  Involved.) 

To  obtain  the  full  flavor  of  tha  cybornatlcUt  -  eomputar  interaction, 
ona  may  acan  such  sources  ••  (84)  to  (96).  (it  le  clear,  for  example,  from 
the  tribute  to  Wiener  by  Oleon  and  Schade  in  (95)  that  ve  are  pursuing  e 
similar  path  in  conaldaring  the  non-linear  'rhythms'  or  spectrum  of  oscilla¬ 
tions  in  the  biological  syatem,  the  concept  of  interactions,  and  of 
synchronisation.) 
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In  closing  we  offer  passing  reference  to  a  few  interesting  neurological 
books  on  the  brain,  (97)  to  (103).  They  will  indicate  some  of  the  content  of 
neurcphysiological  views,  and  the  size  of  the  gap  that  exists  in  brain  'ex¬ 
ploration*  or  'modelling.' 

Summarizing,  any  possible  connection  between  a  theory  of  the  brain  an  : 
the  information  sciences  has  been  directed  by  client  interests.  It  has  bee 
mainly  motivated  toward  overcoming  the  discrepancy  between  human  built  equip¬ 
ment  and  the  obviously  more  compact  and  more  complex  system  performance  that 
can  be  seen  in  the  biological  systems  around  us.  It  has  really  revolved 
mainly  around  communication  engineer  problems,  such  as  how  to  make  a  compact 
airborne  computer  of  broad  capability,  how  to  build  more  general  purposed  tele¬ 
phonic  elements,  how  to  make  better  sensors,  how  to  compress  more  relevant  in¬ 
formation  and  process  data  into  a  given  transmission  channel.  We  believe  it 
is  most  useful  to  direct  each  quertion  specifically  toward  the  pertinent  engi¬ 
neering  problem,  which  in  the  end  is  really  what  happens.  This  has  been  true 
in  character  recognition,  machine  translation,  atomic  energy,  etc.  A  cynical 
view  might  be  that  the  more  fanciful  dressings  are  used  to  capture  the  cus¬ 
tomer's  imagination,  and  then  the  more  mundane  engineering  is  done  under  that 
cover.  At  least,  this  is  what  we  see  in  much  sponsored  research  today  (and 
likely  in  the  past).  Nevertheless,  there  still  remains  the  background  of 
scientific  problems  -  whether  'pure'  or  'applied'  -  that  the  serious  research¬ 
er  knows  are  holding  up  science,  and  its  exploitation.  This  is  often  more 
difficult  to  'sell,'  though  it  would  result  in  capturing  broader  imagination 
than  that  of  the  specialist.  The  issue  stressed  -  in  science  today  broadly, 
and  in  this  project  -  is  che  interdisciplinary  nature  of  the  more  difficult 
scientific  problems.  The  work  of  the  cyberneticists,  our  work,  etc.  are  real 
example#  of  interdisciplinary  efforts.  However,  the  explorations  must  be 
occasionally  tempered  by  seeing  whac  the  experts  in  the  specific  fields  are 
saying  and  the  extent  to  which  the  interdisciplinary  transfers  are  meaningful. 

The  problem  -  in  the  brain  -  is  the  extent  to  which  such  work  as  ours 
and  that  of  the  cyberneticists  impacts  on  communications  engineering  (the 
'syntacticists *  of  communications),  the  librarian  (the  'semanticist*  of  com¬ 
munications),  on  psychology-psychiatry  or,  on  neurology-anatomy-physiclogy 
(the  'pragmaticist'  of  communications),  and  on  engineering,  more  generally, 
finally;  for  this  is  what  most  often  is  the  immediate  patron  Interest  -  in  the 
present  case,  the  Army. 

Those  who  v*nt  to  skiro  litersture  further  beyond  the  present  directed 
aim  would  do  well  to  start  with  the  General  Systems  Yearbooks,  starting  in 
1956, 
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SUMMARY  AMD  DISCUSSION 


1,  The  umbrella  of  the  information  sciences  extends  over  a  num¬ 
ber  of  subjects  which  belong  to  other  disciplines  and  are  not  presently 
separable,  and  those  that  have  bean  successfully  captured  within  its  orbit. 
The  peripheral  fields  are: 

communications  science  and  technology 

computer  science  and  technology 

mathematics  of  stochastic  processes 

data  processing  hardware 

library  science 

philosophy  of  science 

cybernetics 

measurement 

automatic  control  theory 
linguistics 

The  subjects  that  are  poorly  located  elsewhere  and  central  to 
information  sciences  are: 

statistical  characteristics  of  signs  of  interest  to  the 
human  (this  might  be  described  as  statistical  'semiotics, 

1. e.,  neither  syntactics,  phonetics,  or  any  other  limited 
sign  response) 

transmission  of  semantic  content  of  language  (statistical 
'semantics') 

transmission  of  pragmatic  content  of  language  (statistical 
'pragmatics') 

characterising  the  pragmatic  content  of  information  as  it 
exists  in  the  brain  (statistical  'mechanics'  in  the  brain). 

2.  What  remain  possible  for  the  information  sciences  to  capture, 
if  it  pureuea  the  problems  vigorously,  are: 

the  science  of  networks,  as  part  of  a  general  systems 
science  (Why?  What  is  important  in  a  system  is  what 
effective  'information'  really  is  in  transit.) 

the  practical  realisation  of  good  scientific  schemes  for 
encoding  che  pragmatics  of  information,  and  for  information 
handling;  and  as  a  methodology  of  doing  science,  scientific 
discovery,  and  scientific  and  technological  forecasting. 
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3.  Before  expanding  on  th ?se  two  points,  it  muse  be  clear  that  the 
technology  of  'information'  is  not  being  discussed.  The  physical  achievement 
of  mechanizing  information  problems  -  pattern  recognition  and  interpretation, 
machine  translation,  machine  search  and  retrieval,  encoding  and  decoding, 
automata  'computation,'  'command,'  and  'control,'  etc,  -  will  be  handled  by 
practical  engineers,  mostly  electrical  and  electronic,  some  mechanical;  and 
physical  scientists  occupied  in  development. 

A.  Thus,  to  whatever  degree  an  interdisciplinary  science  of  in¬ 
formation  can  come  into  existence  (just  as  communications  science  gradually 
came  into  existence),  it  must  serve  as  a  theoretical  and  practical  hand  maiden 
to  communications  engineering.  (The  practical  hand  maiden  may  involve  train¬ 
ing  and  supplying  working  professionals  capable  of  doing  specific  tasks,  just 
as  the  'human  factors  engineer'  was  supplied  by  psychology,  and  the  'computer 
programmer'  by  mathematics. ) 

What  theoretical  foundation  remains  to  the  information  sciences? 

a.  It  cannot  be  network  analysis.  The  communications  engineer 
is  quite  sicllled  in  network  analysis  -  of  a  certain  sort.  It  is  only  in  such 
a  context  as  this  report,  that  it  begins  to  become  clear  that  the  communica¬ 
tions  and  control  engineer  works  with  an  impoverished  theory.  He  is  still 
beholden  to  the  network  analysis  of  Klrchoff  and  to  mathematical  techniques 
developed  or  implied  by  Fourier  and  Laplace  (l.e.,  summation  of  potentials 
and  fluxes,  harmonic  decomposition,  transformation).  The  combination  of  com¬ 
munications  engineer  and  control  engineer  formalized  the  entire  procedure  in 
the  elementary  concept  of  a  block  diagram.  (This  was  proposed  as  the  general¬ 
ization  for  the  schematic  circuit  diagram.)  However  even  the  chemical  engi¬ 
neer  knew  better  in  his  flow  chart,  though  he  allowed  it  to  degenerate  to  a 
block  diagram.  The  basic  problem,  as  each  problem  in  the  information  sciences 
shows,  is  that  there  is  need  to  develop  a  method  of  analysis  if  systems  that 
can  illustrate  its  hierarchical  nature,  and  that  can  show  how  each  set  is  com¬ 
plete  and  forms  a  mathematical  group  among  ali  possible  systems  of  like  analytic 
nature  in  the  real  world.  To  make  the  point  clearer,  it  is  best  to  illustrate 
it. 


(1)  Maxwell  and  Boltzmann  and  Gibbs  showed  finally  how  the 
problem  of  atomistic  function  transforms  into  ensemble  form. 

(2)  The  problem  was  done  over  and  over  again  -  by  the  bi¬ 
ologist  in  the  genetic  problem,  by  Einatein  in  Brownian  motion,  by  Nyquist  in 
the  electrical  network,  by  Shannon  in  'syntactic'  information  theory,  iu  the 
framework  of  Hegelian  dialectics,  etc. 

(3)  We  can  recognize  the  steps  in  our  own  work.  It  led 
from  a  dissatisfaction  with  electrical  network  analysis  as  a  general  analytic 
analogue  for  all  networks  because  of  non-linear  mechanical  exposure,  to  the 
illustrative  example  of  turbulent?  in  the  hydrodynamic  field  by  which  we 
shoved  how  the  spectrum  of  atomic  properties  leads  to  the  phenomenological 
equations  of  change,  which  leads  to  the  'atomistic'  properties  of  the  spec¬ 
trum  of  turbulence,  with  the  growth  in  understanding  that  this  was  the  first 
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dynamic-physical-mathematical  ’proof'  of  the  reality  of  such  a  hierarchical 
link.  This  was  the  element  that  puts  hierarchical  systems  for  science  into 
the  philosophic  and  scientific  perspective  that  was  contained  in  the  hy¬ 
pothesis  stated  in  (1),  "At  every  size  level,  stability  conditions,  arising 
from  order-disorder  criteria  involving  the  ’atomistic’  oscillator  level, 
break  down  the.  stability.  ...  Then  a  new  super-atom  develops  and  a  super¬ 
organization  of  atoms  grows,"  that  we  were  probing  for  in  our  1957  "Philosophy 
for  Mid-Twentieth  Cencury  Man,"  and  that  as  a  result  of  this  study  and  the 
January  1965  New  York  Academy  Meeting  on  Perspectives  in  Time  have  led  us  to 
realize  may  be  the  direction  out  of  two-valued  logic  problems  as  a  pragmatic 
ordering  added  to  Russell's  theory  of  types,  and  perhaps  helps  to  resolve  any 
paradoxes  associated  with  the  mind-body  problem. 

The  problem  we  see  is  to  embed  each  scientific  problem  into 
the  highest  ordered  'space'  as  a  canonical  system  in  which  it  forms  a  group 
that  is  narrowly  distributed  in  a  hypershell  like  Gibbs'  canonical  or  micro- 
canonical  ensemble.  In  this  space,  the  systems  are  then  'stationary'  and 
ergodic.  The  system  cannot  change  its  base  of  communication.  (We  are  afraid 
that  our  words  will  be  viewed  by  some  purists  as  Malapropian  conversation, 
which  it  partly  is.  However  what  we  are  expressing,  though  vaguely  and  im¬ 
perfectly,  is  the  kind  of  logic  by  which  each  systems  level  is  embedded  in  a 

higher  systems  description.  In  past  days,  one  would  have  philosophically  said 
that  each  embedding  logic  has  nothing  to  do  with  the  successive  one,  i.e., 
the  meta- language  is  not  cast  in  the  same  axiomatic  structure  as  the  calculus 
under  discussion.  However,  ve  now  believe  that  there  may  exist  a  systematic 
common  linking.  This  is  what  we  are  driving  toward.) 

However  this  cannot  be  done  today  as  a  generalization  (al¬ 
though  the  mathematician  may  think  he  can)  in  any  meaningful  way.  Thus  the 

systems  embedding  will  have  to  be  explored  in  a  systematic  way.  Jn  our  view, 
as  described  in  (1),  there  is  a  hierarchy  of  problems  that  range  from  a  re¬ 
examination  of  the  electrical  network  problem  to  the  brain  by  other  than 
single  level  block  diagrams.  This  can  be  the  central  task  in  information  sci¬ 
ences  . 


In  our  view,  thus,  an  information  scientist  of  the  future 
could  be  a  person  capable  of  developing  the  super  block-diagram-of-the-future 
for  any  particular  technical  problem.  He  can  deal  with  the  'signs'  and  'sig¬ 
nals'  of  the  problem. 

b.  The  'semantics'  of  information.  This  includes  the  codifi¬ 
cation,  storage,  transmission,  and  retrieval  of  information  of  interest  to  the 
human.  What  is  true  about  reality  in  minimal  redundant  fashion  might  be  con¬ 
sidered  to  be  the  keynote  of  this  branch  of  information  sciences  of  the  future. 

In  this  field,  the  problem  is  not  to  be  the  generator  or  user 
of  the  information,  but  to  be  the  information  transport  and  handling  linkage. 
However  the  link  Is  not  a  'clerical'  one  (aa  the  network  problem  might  be 
viewed,  since  the  information  theory  expert  in  the  first  field  should  have  a 
repertoire  of  'clerical'  routine*  for  system*  analysis  -  this  is  what  we  have), 
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but  a  'semantic1  one.  What  is  the  most  unique  relation  between  information 
and  that  which  is  designated? 

c.  A  third  field,  of  the  'pragmatics'  of  information,  namely 
what  generator  or  user  meant,  is  outside  of  the  scope  of  the  information  sci¬ 
ences?  To  admit  this  field  would  be  to  want  information  sciences  to  take 
over  all  sciences,  and  that  Lt  cannot  do. 

5.  What  does  this  mean  to  the  Army  in  general,  or  ARO  in  particular, 
as  patron  and  user?  At  most  we  can  only  suggest;  in  fact,  it  is  our  duty  to 
do  so . 


The  problems  that  the  Army  faces,  similar  to  the  other  services 
and  some  other  facet  of  government  that  created  involvement  with  the  informa¬ 
tion  sciences  are: 


a.  the  compact  command  and  control  computer  for  field  use  of 
remote  self-guiding  vehicles  and  weapons 

b.  the  logistics  computer  (which  is  no  problem  in  that  lt  can 
easily  be  in  the  line  of  current  business  computer  development) 

c.  the  limited  purpose  strategy  computer,  or  how  to  Integrate 
the  factors  in  limited  purpose,  limited  boundary  war  and  peace  games 

d.  the  'intelligence'  computers,  suitable  for  such  tasks  as 
coding-decoding,  information  search  and  correlation,  pattern  recognition 

e.  communications  systems,  in  the  sense  of  providing  the  nec¬ 
essary  channels  and  capacity  in  a  given  situation,  rather  than  an  older  view 
of  reeling  out  some  telephone  wire 

f.  a  general  purpose  command  and  decision  computer  with  greater 
capability  than  the  individual's  or  small  group's  brain  to  Integrate  all  the 
pertinent  factors  in  a  longer  space  and  time  situation. 

g.  a  system  for  providing  needed  technical  information. 

6.  Obviously  many  of  the  needs  are  consnon  with  many  other  govern¬ 
ment  agencies  and  should  be  subject  t:o  connon  attack  or  support.  Consider  a 
few  interesting  common  problems. 

'Information'  is  defined  in  three  senses,  one,  of  whatever  comes 
up  next  to  the  casual  observer;  two,  of  whatever  comes  up  with  stochastic  ln- 
determlnary  from  a  deterministic  stationary  universe;  three,  of  whatever  comes 
up  from  an  indeterministic  universe.  Although  lt  appears  stochastic,  If  lt 
Is  really  deterministic,  this  Is  not  an  Information  theory  problem,  but  a 
scientific  problem.  This  la  to  be  handled  by  scientists  attempting  to  put  a 
scientific  foundation  under  the  problem.  This  is  not  one  of  the  connon  needs 
in  information  science. 
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The  common  needs  lie  In  searching  strategies  that  are  common  to 
all  stochastic  information  problems  from  an  ergodic  universe.  Reading  mail, 
patent  searching,  processing  intelligence  data,  handling  traffic,  etc.,  all 
have  these  problems  in  common.  The  common  problem  is  general  network  or 
systam  analysis.  If  this  can  be  done,  then  how  these  general  systems  handle 
stochastic  Inputs  is  quite  well  developed.  The  connection  is  the  following: 
if  one  knows  the  network  characteristics  and  analysis  in  the  brain,  i.e.,  how 
it  handles  standard  inputs,  then  one  can  tell  what  it  will  do  most  generally 
with  stochastic  inputs. 

There  is  the  common  business  machine  problem.  No  comments  are 
needed.  There  is  the  procurement  problems  common  in  many  areas.  It  is  quite 
clear  that  a  common  logic  for  handling  such  problems  is  needed.  Many  of  the 
indulgence  problems  are  quite  similar  among  the  services,  and  it  may  be  pre¬ 
sumed  that  efforts  in  this  area  are  common.  It  appears  that  a  certain  degree 
of  casual  correlation  in  all  such  activities  has  existed  among  ARO,  ONR,  and 
AFOSR.  Of  these  three  groups,  it  may  be  that  ARO  is  perhaps  most  lagging  in 
Internal  exploitation  of  the  information  sciences.  However,  other  branches  of 
the  Army,  particularly  electronic,  seem  to  have  had  considerable  contact  with 
the  field. 


The  broader  command  and  control  information  machine  is,  of  course, 
of  interest  to  all  establishment  power  structures.  However,  its  great .indeter¬ 
minacy  makes  it  a  subject  for  competition  rather  than  cooperation.  Perhaps  this 
is  best;  it  certainly  can  provoke  different  points  of  view  in  seeking  to  dis¬ 
cover  answers.  We  personally  relish  che  competition.  The  search  is  kept 
viable. 


7.  What  is  special  for  the  Army? 

a.  What  information  adjuncts  should  the  self-contained  soldier 
of  the  future  have?  (He  has  a  different  scope  and  range  than  does  the  man  in 
the  air  or  space  or  water.) 

b.  What  are  the  local  communications  possibilities  -  both  for 
maximum  communication  with  possible  channels,  and  for  maximum  lack  of 
detection? 


c.  What  man-machine  integrations  are  most  plausible  and  useful? 

d.  Geopolitics  of  war  and  peace. 
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